How to automate browsing web with AI

Issue #1022

Web scraping used to mean writing brittle scripts that broke whenever a site changed. Now AI agents can browse the web like humans do. They read pages, click buttons, fill forms, and extract data without you writing CSS selectors for every element.

This guide covers five tools that make AI-powered browser automation possible. Some are low-level SDKs. Others handle everything from navigation to data extraction. Pick the right one based on how much control you need.

The Foundation: Playwright and Puppeteer

Before AI enters the picture, you need something to control the browser. That’s where Playwright and Puppeteer come in.

Puppeteer is Google’s Node.js library for Chrome automation. It talks directly to Chrome through the DevTools Protocol. Simple and fast for Chrome-only tasks.

const puppeteer = require('puppeteer');

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'screenshot.png' });
await browser.close();

Playwright does the same thing but works across Chrome, Firefox, and Safari. Microsoft built it with better auto-waiting, so you spend less time writing waitForSelector calls.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto('https://example.com')
    page.screenshot(path='screenshot.png')
    browser.close()

Neither tool has AI built in. They’re the hands. AI provides the brain.

Browser-Use: Let AI Control the Browser

Browser-Use is a Python library that connects language models to browser automation. You describe a task in plain English, and the AI figures out what to click and type.

Install it with Python 3.11 or higher:

uv init && uv add browser-use && uv sync

Here’s a basic example. The agent finds GitHub stars without you writing any selectors:

from browser_use import Agent, Browser, ChatBrowserUse
import asyncio

async def main():
    browser = Browser()
    agent = Agent(
        task="Find the price of the MacBook Pro on Apple's website",
        llm=ChatBrowserUse(),
        browser=browser,
    )
    result = await agent.run()
    print(result)

asyncio.run(main())

The library works with multiple LLM providers. Use ChatBrowserUse() for their optimized model, or swap in Claude, Gemini, or local models through Ollama. The AI sees the page, identifies interactive elements, and decides what actions to take.

Browser-Use shines when the task is clear but the path isn’t. Search for a product, compare prices across tabs, fill out a multi-step form. You describe the goal, and the agent handles navigation.

Browserbase: Cloud Browsers for Scale

Running browsers locally works for development. Production needs something else. Browserbase provides cloud browser infrastructure that handles scaling, proxies, and captchas.

Connect your existing Playwright code to Browserbase with minimal changes:

from playwright.sync_api import sync_playwright
from browserbase import Browserbase
import os

bb = Browserbase(api_key=os.environ["BROWSERBASE_API_KEY"])

with sync_playwright() as playwright:
    session = bb.sessions.create()
    browser = playwright.chromium.connect_over_cdp(session.connect_url)

    page = browser.contexts[0].pages[0]
    page.goto('https://example.com')
    print(page.title())

    browser.close()

The same code that runs locally now runs on Browserbase’s servers. They handle proxy rotation, fingerprint randomization, and captcha solving. Your scripts become more reliable without extra code.

Browserbase also offers Stagehand, their AI automation framework. It sits between raw Playwright and full AI agents:

import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({
  env: "BROWSERBASE",
  model: {
    modelName: "google/gemini-3-flash-preview",
    apiKey: process.env.MODEL_API_KEY,
  }
});

await stagehand.init();
const page = stagehand.context.pages()[0];
await page.goto("https://news.ycombinator.com");

// AI handles the clicking
await stagehand.act("click on the comments link for the top story");

// Extract structured data
const data = await stagehand.extract("extract the title and points of the top story");
console.log(data);

await stagehand.close();

Use code when you know exactly what to do. Use natural language when the page structure might vary.

Firecrawl: Web Data for LLMs

Firecrawl takes a different approach. Instead of controlling a browser, it converts websites into clean data that AI can process.

Send a URL, get back markdown or structured JSON:

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape("https://example.com", formats=["markdown"])
print(doc["markdown"])

The real power is structured extraction. Define a schema, and Firecrawl pulls exactly the data you need:

from pydantic import BaseModel
from firecrawl import Firecrawl

class CompanyInfo(BaseModel):
    name: str
    description: str
    is_hiring: bool

app = Firecrawl(api_key="fc-YOUR_API_KEY")
result = app.scrape(
    "https://stripe.com",
    formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}]
)
print(result)

Firecrawl also has an agent mode. Describe what data you want, and it searches and navigates to find it:

result = app.agent(prompt="Find the pricing for OpenAI's GPT-4 API")
print(result.data)

No URLs required. The AI figures out where to look.

The Foundation: Playwright and Puppeteer

Browser-Use: Let AI Control the Browser

Browserbase: Cloud Browsers for Scale

Firecrawl: Web Data for LLMs

Start the conversation

Related

What AI tools to use with Claude Code

Context & Memory

claude-mem

How to connect MCP server in Claude Code

Scopes and Configuration Storage

How to use Claude Code

Extended Thinking: The Magic Words

Pages

About

Projects