What is Stagehand ?

Stagehand allows you to automate browsers with natural language and code.

You can use Stagehand to do anything a web browser can do! Browser automations written with Stagehand are designed to be repeatable, customizable, and maintainable.

That entire browser automation can be written in just a few lines of code with Stagehand:

javascript

const page = stagehand.page;
await page.goto("https://docs.stagehand.dev");

// Use act() to take an action on the page
await page.act("Click the search box");

// Use observe() to plan an action before doing it
const [action] = await page.observe(
  "Type 'Tell me in one sentence why I should use Stagehand' into the search box"
);
await page.act(action);

// Cache actions to avoid redundant LLM calls!
await actWithCache(page, "Click the suggestion to use AI");
await page.waitForTimeout(2000);

// Use extract() to extract structured data from the page
const { text } = await page.extract({
  instruction: "extract the text of the AI suggestion from the search results",
  schema: z.object({
    text: z.string(),
  }),
});

To completely avoid the limitations of AI agents, Stagehand borrows the page and context objects from Playwright to give you full control over the browser session.

Stagehand works on any Chromium-based browser (Chrome, Edge, Arc, Dia, Brave, etc.). It is built and maintained by the Browserbase team.

For best results, we strongly recommend using Stagehand on a Browserbase browser.

How thing’s are done with AI Agents?

The simple answer is that existing solutions are either too brittle or too agentic.

You might’ve heard of OpenAI Operator, which is a web agent that uses Playwright to take actions on a website.

While OpenAI Operator is a great tool, it is completely agentic; agents leave you at the mercy of AI to do the right thing over a large number of tasks. Agents are fundamentally designed for one-shotting tasks, not repeatability.

Put simply, you can’t control what an agent does.

How Puppeteer or Playwright works?

Not only are these tools tedious and cumbersome to write, but they are also brittle. If you don’t own the website, you can’t control what the DOM looks like.

As a result, Playwright, Puppeteer, and Selenium force you to write brittle code that breaks when the website makes even a slight UI change.

How Stagehand makes it much more visual friendly?

By combining agents, tools, and Playwright, Stagehand lets you write deterministic code that is resilient to unpredictable DOM changes.

Repeatability: Write code that can be repeated exactly the same way every time.
Resilience: Write code that is resilient to unpredictable DOM changes.

It allows you to build as complex or as simple browser automations as you want, like the example below.

Installing Stagehand…

Add Stagehand to an existing Node.js project

We highly recommend using the Node.js runtime environment to run Stagehand scripts, as opposed to newer alternatives like Deno or Bun.
Bun does not support Stagehand since it doesn’t support Playwright.

We strongly recommend using Stagehand in a new project with npx create-browser-app. Check out our quickstart guide to get started.

However, if you have an existing project, you can install Stagehand by installing the @browserbasehq/stagehand package and zod (for structured output).

bash

npm install @browserbasehq/stagehand zod

You may also need to install the Playwright browser to run your Stagehand scripts, especially if you’re running locally.

bash

# Useful for local development
npx playwright install

Then, you can use Stagehand in your project by importing the Stagehand class.

typescript

import { Stagehand } from "@browserbasehq/stagehand";

async function main() {
	const stagehand = new Stagehand({
		/**
		 * With npx create-browser-app, this config is found 
		 * in a separate stagehand.config.ts file
		*/
		env: "LOCAL",
		modelName: "gpt-4o",
		modelClientOptions: {
			apiKey: process.env.OPENAI_API_KEY,
		},
	});
	await stagehand.init();

	const page = stagehand.page;

	await page.goto("https://www.google.com");
	await page.act("Type in 'Browserbase' into the search bar");

	const { title } = await page.extract({
		instruction: "The title of the first search result",
		schema: z.object({
			title: z.string(),
		}),
	});
	

	await stagehand.close();
}

main();

Building a Web Browsing Agent

Automating websites in 2025 isn’t just about clicking buttons — it’s about intelligently navigating unpredictable interfaces, handling dynamic forms, and extracting structured data. In this article, we’ll walk through how we built a hybrid AI-assisted web browsing agent to automate Maersk’s shipping portal, using tools like Stagehand, Claude AI, Playwright, and Zod for validation.

🧩 System Overview

We’re using the following components:

🛠️ Initialisation and Architecture

The class MaerskCarrier is the entry point. It initialises everything from browser context to AI agents:

javascript

this.stagehand = new Stagehand({
  env: "LOCAL",
  modelName: process.env.MODEL_NAME,
  headless: false,
  args: ['--disable-web-security', ...]
});

Why Stagehand?

Stagehand provides a high-level wrapper around Playwright, while supporting LLM-enhanced agents like Claude. This hybrid approach combines deterministic browser control with natural-language fallbacks — essential for modern websites with:

Lazy-loaded DOMs
Shadow DOM-based elements
Multi-step forms with React event bindings

🔐 Hybrid Login Strategy

Login flows are notoriously non-deterministic. Maersk might present a cookie popup, a dynamic button labeled "Sign in," or redirect through a subdomain. Hence, we first attempt:

Browser-first actions:

typescript

await this.page.click('text="Login"');
await this.page.fill('input[name="username"]', this.username)

If that fails → Claude fallback:

This ensures reliability. Claude can parse visual layouts and bypass traditional selector issues that static automation tools struggle with.

We also wrap cookie handling and dynamic modal closures inside try/catch, giving Claude a chance to intervene if the page isn’t behaving as expected.

🧠 Claude Agent: Natural Language as Automation Logic

Claude agents are used in three main areas:

Complex UI interpretation (e.g., finding dynamic form fields)
Sequential tasks (e.g., fill origin → destination → container)
Data extraction from semi-structured HTML

Instead of writing 100 selectors or XPath expressions, you just describe the task:

javascript

await this.agent.execute(`
  Fill the Maersk form:
  FROM: Chennai
  TO: Rotterdam
  Use dropdowns and ensure correct selection.
`)

This is powerful in brittle UIs that may change labels but not underlying structure.

📄 Page Verification using Zod

After login, we verify the state of the page using Claude + zod schema validation.

javascript

z.object({
  hasBookingForm: z.boolean(),
  elementsFound: z.array(z.string()),
  formReadyForInput: z.boolean()
}

This allows us to semantically verify whether we’ve reached the booking form — not just by checking the URL but also by analyzing the page content intelligently.

📦 Booking Form Fill with Optimization

We handle booking in three structured steps:

1. Origin/Destination Fields

javascript

await this.agent.execute(`
  Fill origin: ${origin}
  Fill destination: ${destination}
  Select both from dropdowns.
`);

2. Commodity + Container Details

This is treated as one cohesive form segment:

javascript

await this.agent.execute(`
  Set commodity to: Machinery
  Set container: 20ft, Dry, 2 units
`);

3. Transport & Final Form Setup

Claude fills remaining fields (weight, pickup/delivery type, date), ensuring that dropdown dependencies are respected. This improves resilience to Maersk’s reactive UI changes.

🚀 Submitting the Form

We try browser-first again:

javascript

await this.page.act({
  method: "click",
  selector: "button:has-text('Continue')"
})

Fallback if needed:

javascript

await this.agent.execute("Click the main button to proceed to shipping rates");

What are some of the common acts used in Stagehand?

Stagehand maps natural language to Playwright actions.

Each of these actions can be triggered using natural language commands. For example:

This modular structure lets us debug more effectively — every action has a retry mechanism and a screenshot is taken on failure.

📊 Rate Extraction with Structured Schema

Once we reach the rates page, we perform:

1. Natural-language overview

javascript

await this.agent.execute("Summarize all available shipping options");

2. Structured data extraction

javascript

z.object({
  rates: z.array(z.object({
    serviceName: z.string(),
    price: z.string().nullable(),
    transitTime: z.string().nullable()
  }))
})

This lets us igest the results directly into APIs or databases for further processing.

📼 Replay and Debugging

All actions performed by the Claude agent are recorded. We can generate a replay script that re-runs the entire flow deterministically:

javascript

await this.saveActionHistory('maersk_replay.js');

This is especially helpful during testing or regression validation.

🧠 Rate-Limiting Claude

Claude API calls are rate-limited:

javascript

if (timeSinceLastCall < this.claudeRateLimit) {
  logger.warning('Rate limiting Claude call...');
}

A imple Date.now() diff ensures that we don’t overload Claude or run into 429s. This keeps things stable and production-safe.

📘 Final Thoughts

Building a Claude-assisted web browsing agent is not just about scripting; it's about building an intelligent interaction layer on top of unpredictable UIs. With tools like Stagehand, Zod, and Claude, you’re not just automating clicks — you’re enabling autonomous decision-making in the browser.

This hybrid approach scales well across portals like Maersk, Hapag-Lloyd, and others — especially when UI changes frequently or uses JavaScript-heavy interfaces.

Is Building Browsing Agents Really Hard?

What is Stagehand ?

Stagehand allows you to automate browsers with natural language and code.

How thing’s are done with AI Agents?

How Puppeteer or Playwright works?

How Stagehand makes it much more visual friendly?

Installing Stagehand…

Building a Web Browsing Agent

🧩 System Overview

🛠️ Initialisation and Architecture

Why Stagehand?

🔐 Hybrid Login Strategy

Browser-first actions:

If that fails → Claude fallback:

🧠 Claude Agent: Natural Language as Automation Logic

📄 Page Verification using Zod

📦 Booking Form Fill with Optimization

1. Origin/Destination Fields

2. Commodity + Container Details

3. Transport & Final Form Setup

🚀 Submitting the Form

What are some of the common acts used in Stagehand?

📊 Rate Extraction with Structured Schema

1. Natural-language overview

2. Structured data extraction

📼 Replay and Debugging

🧠 Rate-Limiting Claude

📘 Final Thoughts

Enjoyed This Article?