What is Stagehand ?
Stagehand allows you to automate browsers with natural language and code.
You can use Stagehand to do anything a web browser can do! Browser automations written with Stagehand are designed to be repeatable, customizable, and maintainable.
That entire browser automation can be written in just a few lines of code with Stagehand:
const page = stagehand.page;
await page.goto("https://docs.stagehand.dev");
// Use act() to take an action on the page
await page.act("Click the search box");
// Use observe() to plan an action before doing it
const [action] = await page.observe(
"Type 'Tell me in one sentence why I should use Stagehand' into the search box"
);
await page.act(action);
// Cache actions to avoid redundant LLM calls!
await actWithCache(page, "Click the suggestion to use AI");
await page.waitForTimeout(2000);
// Use extract() to extract structured data from the page
const { text } = await page.extract({
instruction: "extract the text of the AI suggestion from the search results",
schema: z.object({
text: z.string(),
}),
});
To completely avoid the limitations of AI agents, Stagehand borrows the page
and context
objects from Playwright to give you full control over the browser session.
Stagehand works on any Chromium-based browser (Chrome, Edge, Arc, Dia, Brave, etc.). It is built and maintained by the Browserbase team.
For best results, we strongly recommend using Stagehand on a Browserbase browser.
How thing’s are done with AI Agents?
The simple answer is that existing solutions are either too brittle or too agentic.
You might’ve heard of OpenAI Operator, which is a web agent that uses Playwright to take actions on a website.
While OpenAI Operator is a great tool, it is completely agentic; agents leave you at the mercy of AI to do the right thing over a large number of tasks. Agents are fundamentally designed for one-shotting tasks, not repeatability.
Put simply, you can’t control what an agent does.

How Puppeteer or Playwright works?
Not only are these tools tedious and cumbersome to write, but they are also brittle. If you don’t own the website, you can’t control what the DOM looks like.
As a result, Playwright, Puppeteer, and Selenium force you to write brittle code that breaks when the website makes even a slight UI change.

How Stagehand makes it much more visual friendly?
By combining agents, tools, and Playwright, Stagehand lets you write deterministic code that is resilient to unpredictable DOM changes.
- Repeatability: Write code that can be repeated exactly the same way every time.
- Resilience: Write code that is resilient to unpredictable DOM changes.
It allows you to build as complex or as simple browser automations as you want, like the example below.

Installing Stagehand…
Add Stagehand to an existing Node.js project
We highly recommend using the Node.js runtime environment to run Stagehand scripts, as opposed to newer alternatives like Deno or Bun.
Bun does not support Stagehand since it doesn’t support Playwright.
We strongly recommend using Stagehand in a new project with npx create-browser-app
. Check out our quickstart guide to get started.
However, if you have an existing project, you can install Stagehand by installing the @browserbasehq/stagehand
package and zod
(for structured output).
npm install @browserbasehq/stagehand zod
You may also need to install the Playwright browser to run your Stagehand scripts, especially if you’re running locally.
# Useful for local development
npx playwright install
Then, you can use Stagehand in your project by importing the Stagehand class.
import { Stagehand } from "@browserbasehq/stagehand";
async function main() {
const stagehand = new Stagehand({
/**
* With npx create-browser-app, this config is found
* in a separate stagehand.config.ts file
*/
env: "LOCAL",
modelName: "gpt-4o",
modelClientOptions: {
apiKey: process.env.OPENAI_API_KEY,
},
});
await stagehand.init();
const page = stagehand.page;
await page.goto("https://www.google.com");
await page.act("Type in 'Browserbase' into the search bar");
const { title } = await page.extract({
instruction: "The title of the first search result",
schema: z.object({
title: z.string(),
}),
});
await stagehand.close();
}
main();
Building a Web Browsing Agent
Automating websites in 2025 isn’t just about clicking buttons — it’s about intelligently navigating unpredictable interfaces, handling dynamic forms, and extracting structured data. In this article, we’ll walk through how we built a hybrid AI-assisted web browsing agent to automate Maersk’s shipping portal, using tools like Stagehand, Claude AI, Playwright, and Zod for validation.
🧩 System Overview
We’re using the following components:
🛠️ Initialisation and Architecture
The class MaerskCarrier
is the entry point. It initialises everything from browser context to AI agents:
this.stagehand = new Stagehand({
env: "LOCAL",
modelName: process.env.MODEL_NAME,
headless: false,
args: ['--disable-web-security', ...]
});
Why Stagehand?
Stagehand provides a high-level wrapper around Playwright, while supporting LLM-enhanced agents like Claude. This hybrid approach combines deterministic browser control with natural-language fallbacks — essential for modern websites with:
- Lazy-loaded DOMs
- Shadow DOM-based elements
- Multi-step forms with React event bindings
🔐 Hybrid Login Strategy
Login flows are notoriously non-deterministic. Maersk might present a cookie popup, a dynamic button labeled "Sign in," or redirect through a subdomain. Hence, we first attempt:
Browser-first actions:
await this.page.click('text="Login"');
await this.page.fill('input[name="username"]', this.username)
If that fails → Claude fallback:
This ensures reliability. Claude can parse visual layouts and bypass traditional selector issues that static automation tools struggle with.
We also wrap cookie handling and dynamic modal closures inside try/catch
, giving Claude a chance to intervene if the page isn’t behaving as expected.
🧠 Claude Agent: Natural Language as Automation Logic
Claude agents are used in three main areas:
- Complex UI interpretation (e.g., finding dynamic form fields)
- Sequential tasks (e.g., fill origin → destination → container)
- Data extraction from semi-structured HTML
Instead of writing 100 selectors or XPath expressions, you just describe the task:
await this.agent.execute(`
Fill the Maersk form:
FROM: Chennai
TO: Rotterdam
Use dropdowns and ensure correct selection.
`)
This is powerful in brittle UIs that may change labels but not underlying structure.
📄 Page Verification using Zod
After login, we verify the state of the page using Claude + zod
schema validation.
z.object({
hasBookingForm: z.boolean(),
elementsFound: z.array(z.string()),
formReadyForInput: z.boolean()
}
This allows us to semantically verify whether we’ve reached the booking form — not just by checking the URL but also by analyzing the page content intelligently.
📦 Booking Form Fill with Optimization
We handle booking in three structured steps:
1. Origin/Destination Fields
await this.agent.execute(`
Fill origin: ${origin}
Fill destination: ${destination}
Select both from dropdowns.
`);
2. Commodity + Container Details
This is treated as one cohesive form segment:
await this.agent.execute(`
Set commodity to: Machinery
Set container: 20ft, Dry, 2 units
`);
3. Transport & Final Form Setup
Claude fills remaining fields (weight, pickup/delivery type, date), ensuring that dropdown dependencies are respected. This improves resilience to Maersk’s reactive UI changes.
🚀 Submitting the Form
We try browser-first again:
await this.page.act({
method: "click",
selector: "button:has-text('Continue')"
})
Fallback if needed:
await this.agent.execute("Click the main button to proceed to shipping rates");
What are some of the common acts used in Stagehand?
Stagehand maps natural language to Playwright actions.
Each of these actions can be triggered using natural language commands. For example:
This modular structure lets us debug more effectively — every action has a retry mechanism and a screenshot is taken on failure.
📊 Rate Extraction with Structured Schema
Once we reach the rates page, we perform:
1. Natural-language overview
await this.agent.execute("Summarize all available shipping options");
2. Structured data extraction
z.object({
rates: z.array(z.object({
serviceName: z.string(),
price: z.string().nullable(),
transitTime: z.string().nullable()
}))
})
This lets us igest the results directly into APIs or databases for further processing.
📼 Replay and Debugging
All actions performed by the Claude agent are recorded. We can generate a replay script that re-runs the entire flow deterministically:
await this.saveActionHistory('maersk_replay.js');
This is especially helpful during testing or regression validation.
🧠 Rate-Limiting Claude
Claude API calls are rate-limited:
if (timeSinceLastCall < this.claudeRateLimit) {
logger.warning('Rate limiting Claude call...');
}
A imple Date.now()
diff ensures that we don’t overload Claude or run into 429s. This keeps things stable and production-safe.
📘 Final Thoughts
Building a Claude-assisted web browsing agent is not just about scripting; it's about building an intelligent interaction layer on top of unpredictable UIs. With tools like Stagehand, Zod, and Claude, you’re not just automating clicks — you’re enabling autonomous decision-making in the browser.
This hybrid approach scales well across portals like Maersk, Hapag-Lloyd, and others — especially when UI changes frequently or uses JavaScript-heavy interfaces.