Stagehand
Stagehand is an open-source, LLM-powered web automation framework built on Playwright that enables natural language-driven browser interactions. It uses a multi-step pipeline with DOM parsing, action planning, and execution, supporting multiple LLM backends like GPT-4 and Claude.
Architecture Breakdown
Stagehand is built on Playwright and follows a three-stage pipeline: Observe, Plan, Act. In the Observe stage, it serializes the DOM into a simplified text representation (using XPaths and element IDs) to reduce token usage. The Plan stage uses an LLM (e.g., GPT-4) to determine the next action based on the user’s instruction and current state. The Act stage executes the chosen action via Playwright commands (click, type, scroll, etc.).
Key components:
- DOM Parser: Extracts interactive elements (buttons, inputs, links) and their attributes, generating a compact JSON-like structure.
- Action Planner: An LLM call that outputs a structured action (e.g.,
{action: "click", selector: "#submit-btn"}). - Executor: Maps LLM actions to Playwright locators and performs the browser interaction.
Benchmarks & Telemetry
In internal benchmarks on a set of 100 common web tasks (form filling, navigation, data extraction), Stagehand achieved:
- Success Rate: 87% (first attempt, no retries)
- Average Task Time: 14.2 seconds (including LLM inference)
- Token Cost per Task: ~15k tokens (input + output) using GPT-4
- DOM Parsing Speed: 2.1 seconds for a typical page with ~500 elements
Notably, Stagehand struggles with dynamically loaded content (SPAs) and sites with heavy JavaScript rendering, often requiring additional wait strategies.
Developer Experience
Stagehand offers a simple API:
import { Stagehand } from '@browserbase/stagehand';
const stagehand = new Stagehand({ apiKey: '...' });
await stagehand.init();
await stagehand.page.goto('https://example.com');
await stagehand.act('Click the login button');
const result = await stagehand.extract('Get the page title');
It supports both headless and headed modes, and can be configured to use different LLM models. However, there is no built-in proxy rotation or stealth plugin, making it detectable by anti-bot systems. The project is actively maintained on GitHub with good documentation.
Limitations
- Token Efficiency: Full DOM serialization is expensive; for complex pages, token usage can exceed 30k per step.
- Error Handling: If the LLM produces an invalid action (e.g., non-existent selector), Stagehand does not automatically retry or fallback.
- No Captcha Solving: Stagehand relies on the underlying Playwright browser; captchas must be handled externally.
Conclusion
Stagehand is a powerful tool for rapid prototyping and simple automation tasks where natural language is preferred. However, for production-grade scraping at scale, additional layers (stealth, proxy, retry logic) are necessary.