Stagehand Benchmark & Review | All AI Browser
[AGENT_TYPE: HEADLESS BROWSER, LLM FRAMEWORK]

Stagehand

Stagehand is an open-source, LLM-powered web automation framework built on Playwright that enables natural language-driven browser interactions. It uses a multi-step pipeline with DOM parsing, action planning, and execution, supporting multiple LLM backends like GPT-4 and Claude.

> dom_parse_time 2.1s
> captcha_bypass N/A
> success_rate 87%
> token_cost 15k tokens

Architecture Breakdown

Stagehand is built on Playwright and follows a three-stage pipeline: Observe, Plan, Act. In the Observe stage, it serializes the DOM into a simplified text representation (using XPaths and element IDs) to reduce token usage. The Plan stage uses an LLM (e.g., GPT-4) to determine the next action based on the user’s instruction and current state. The Act stage executes the chosen action via Playwright commands (click, type, scroll, etc.).

Key components:

  • DOM Parser: Extracts interactive elements (buttons, inputs, links) and their attributes, generating a compact JSON-like structure.
  • Action Planner: An LLM call that outputs a structured action (e.g., {action: "click", selector: "#submit-btn"}).
  • Executor: Maps LLM actions to Playwright locators and performs the browser interaction.

Benchmarks & Telemetry

In internal benchmarks on a set of 100 common web tasks (form filling, navigation, data extraction), Stagehand achieved:

  • Success Rate: 87% (first attempt, no retries)
  • Average Task Time: 14.2 seconds (including LLM inference)
  • Token Cost per Task: ~15k tokens (input + output) using GPT-4
  • DOM Parsing Speed: 2.1 seconds for a typical page with ~500 elements

Notably, Stagehand struggles with dynamically loaded content (SPAs) and sites with heavy JavaScript rendering, often requiring additional wait strategies.

Developer Experience

Stagehand offers a simple API:

import { Stagehand } from '@browserbase/stagehand';

const stagehand = new Stagehand({ apiKey: '...' });
await stagehand.init();
await stagehand.page.goto('https://example.com');
await stagehand.act('Click the login button');
const result = await stagehand.extract('Get the page title');

It supports both headless and headed modes, and can be configured to use different LLM models. However, there is no built-in proxy rotation or stealth plugin, making it detectable by anti-bot systems. The project is actively maintained on GitHub with good documentation.

Limitations

  • Token Efficiency: Full DOM serialization is expensive; for complex pages, token usage can exceed 30k per step.
  • Error Handling: If the LLM produces an invalid action (e.g., non-existent selector), Stagehand does not automatically retry or fallback.
  • No Captcha Solving: Stagehand relies on the underlying Playwright browser; captchas must be handled externally.

Conclusion

Stagehand is a powerful tool for rapid prototyping and simple automation tasks where natural language is preferred. However, for production-grade scraping at scale, additional layers (stealth, proxy, retry logic) are necessary.