Skip to content

Specialization 3: Agentic Browser Automation

Write instructions, not scripts — tell an AI agent what to do in the browser, and it does it.

INFO

  • Time: ~3-4 hours
  • Difficulty: Beginner-Intermediate
  • Prerequisites: Core Modules completed

This Page Covers

  • How agentic browser automation differs from traditional scripting
  • How Playwright MCP gives an agent browser control
  • Setting up Playwright MCP
  • Writing effective agent instructions (the core skill)
  • Practical examples as instruction documents
  • Debugging when the agent gets stuck
  • Ethics, limits, and responsible automation

The Shift: From Scripts to Instructions

The Old Way

Traditional browser automation meant learning a programming language. You would:

  1. Install Node.js and Playwright (or Selenium, or Puppeteer)
  2. Learn the JavaScript API for browser control
  3. Write code to find elements using CSS selectors
  4. Debug timing issues, broken selectors, and flaky tests
  5. Maintain scripts when websites change

This was powerful but had a high barrier to entry. Most people who could benefit from browser automation never got past step 2.

The New Way

With agentic browser automation, you write a structured document describing what you want to happen. The AI agent reads your instructions, controls the browser, and reports the results.

You do not write code. You do not learn CSS selectors. You do not debug JavaScript. Instead, you describe what you see on the screen and what should happen, and the agent figures out the implementation.

Example — the old way (JavaScript):

javascript
await page.goto('https://example.com/login');
await page.fill('input[name="email"]', 'user@example.com');
await page.fill('input[name="password"]', 'mypassword');
await page.click('button[type="submit"]');
await page.waitForURL('**/dashboard');

Example — the new way (instructions):

markdown
1. Go to https://example.com/login
2. Enter "user@example.com" in the email field
3. Enter the password in the password field
4. Click the "Sign In" button
5. Wait until the dashboard page loads
6. Take a screenshot to confirm

The skill shifts from "knowing a programming API" to "writing clear instructions for an agent."

Honest Limitations

Agentic browser automation is not magic. It works well for:

  • Straightforward page interactions (clicks, typing, navigation)
  • Pages with clear, labeled UI elements
  • Repetitive tasks with predictable steps

It struggles with:

  • Complex dynamic pages (heavy JavaScript, infinite scrolls)
  • Unusual UI patterns (custom drag-and-drop, canvas-based interfaces)
  • CAPTCHA and bot-detection systems
  • Tasks requiring pixel-perfect precision

When the agent gets confused, you can usually fix it by writing more specific instructions. We cover debugging techniques later.


How Agentic Browser Automation Works

Playwright MCP

Playwright is browser automation software created by Microsoft. Playwright MCP is an MCP server that gives an AI agent the ability to control a browser through Playwright — without you writing any Playwright code.

The flow:

  1. You write instructions describing what should happen
  2. The agent reads your instructions and interprets each step
  3. Playwright MCP translates the agent's actions into browser commands
  4. The browser executes the commands (click, type, navigate, screenshot)
  5. The agent observes the result and proceeds to the next step

What the Agent Can Do

Through Playwright MCP, the agent can:

  • Navigate — go to URLs, click links, go back/forward
  • Read — see page content, read text from elements
  • Interact — click buttons, fill forms, select dropdowns, check boxes
  • Wait — pause until elements appear or pages load
  • Capture — take screenshots, extract text or data from tables
  • Evaluate — run simple checks ("does the page contain this text?")

What the Agent Cannot Do

  • Bypass CAPTCHAs or bot detection
  • Interact with native OS dialogs (file pickers, print dialogs)
  • Work with browser extensions
  • Handle complex drag-and-drop or canvas-based UIs reliably
  • Maintain sessions across separate agent conversations (unless you handle cookies)

Setting Up Playwright MCP

Prerequisites

  • Node.js installed (nodejs.org — LTS version)
  • Claude Desktop or Claude Code installed

Option A: Claude Code

Add Playwright MCP to your project by creating a .mcp.json file in your project root (or add to your user-level ~/.claude/settings.json):

json
{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp"]
    }
  }
}

After saving, restart Claude Code so it picks up the new MCP server.

Option B: Claude Desktop

Add Playwright MCP to your Claude Desktop configuration file:

On Mac (~/Library/Application Support/Claude/claude_desktop_config.json):

json
{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp"]
    }
  }
}

On Windows (%APPDATA%\Claude\claude_desktop_config.json):

json
{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@playwright/mcp"]
    }
  }
}

After saving, restart Claude Desktop completely (quit and reopen, not just close the window).

Verify It Works

Test with a simple instruction:

"Open https://example.com and take a screenshot."

If the agent opens a browser, navigates to the page, and returns a screenshot, your setup is working.

First run downloads a browser

The first time Playwright MCP runs, it will automatically download a Chromium browser (~150 MB). This is a one-time download — subsequent runs start immediately. If it seems to hang for a minute on the first launch, this is why.


Running Your First Automation

Setup is done. Before we get into writing longer instruction documents, let's see what the actual experience looks like — you type a plain-language instruction, and the agent does the rest.

A Concrete Example

Open Claude Desktop (or Claude Code) with Playwright MCP configured. Type the following into the chat:

Go to https://news.ycombinator.com. Read the top 5 headlines and tell me their titles and point counts.

That is it. No special syntax, no file to reference, no command to run. You type your request in plain language, just like you would ask a colleague.

Here is what the agent does behind the scenes:

  1. Launches a browser — Playwright MCP opens a Chromium window (you may see it flash open)
  2. Navigates to the page — the agent goes to the URL you specified
  3. Reads the page content — the agent inspects the page to find the headlines
  4. Extracts the data — it pulls out the titles and point counts
  5. Reports back — you get a structured response in the chat, like:

Here are the top 5 headlines from Hacker News:

  1. "Show HN: I built a tool for..." — 342 points
  2. "Why databases should..." — 289 points
  3. ...

The entire interaction happens in the chat window. The agent handles the browser; you just read the result.

What You Can Try Next

Once that first example works, try a few more single-instruction automations to build confidence:

Each of these is a one-shot instruction. You type it, the agent does it, you see the result.

From Chat Instructions to Instruction Documents

For quick tasks, typing directly into the chat works well. But as your automations get more complex — multi-step workflows, login flows, branching logic — writing your instructions in a Markdown file first has real advantages:

  • You can review and refine before running
  • You can reuse the same instructions next week
  • You can share them with colleagues
  • You can version-control them in your workspace (from Specialization 1)

The next section teaches you how to write these structured instruction documents.


Writing Effective Agent Instructions

This is the core skill of agentic browser automation. The quality of your instructions directly determines whether the agent succeeds or fails.

Structure as Numbered Steps

Always write instructions as a clear numbered sequence:

markdown
## Check inventory on supplier website

1. Go to https://supplier.example.com
2. Click the "Login" button in the top right
3. Enter the username in the email field
4. Enter the password in the password field
5. Click "Sign In"
6. Navigate to the "Inventory" section using the left sidebar
7. Search for "Widget A" using the search box
8. Note the current stock number shown next to "Widget A"
9. Take a screenshot of the results
10. Report the stock number back to me

Be Specific About Visual Elements

Do not reference code or technical selectors. Describe what you see on the screen:

Too vague:

"Click the button"

Good:

"Click the blue 'Submit' button at the bottom of the form"

Also good:

"Click the 'Sign In' link in the navigation bar at the top of the page"

The agent uses visual descriptions and text labels to find elements. Give it enough detail to identify the right one.

Include Wait Conditions

Web pages take time to load. Tell the agent when to wait:

markdown
5. Click "Generate Report"
6. Wait until the loading spinner disappears and the report table is visible
7. Take a screenshot of the report

Without wait conditions, the agent might try to read or interact with content that has not loaded yet.

Include Branching Instructions

Real workflows have conditions. Handle them:

markdown
7. Look for the "In Stock" or "Out of Stock" label
8. If "In Stock": note the quantity and continue to step 9
9. If "Out of Stock": take a screenshot, report "out of stock", and stop

Describe Expected Outcomes

Tell the agent what success looks like:

markdown
10. After submission, you should see a confirmation message that says "Your order has been placed"
11. If you see an error message instead, take a screenshot and report the error text

Common Instruction Patterns

Login flow:

markdown
1. Go to [URL]
2. Enter [username] in the email/username field
3. Enter [password] in the password field
4. Click the "Sign In" / "Login" button
5. Wait until the dashboard or home page loads
6. Take a screenshot to confirm successful login

Form filling:

markdown
1. Go to [form URL]
2. Fill in the "Name" field with [value]
3. Fill in the "Email" field with [value]
4. Select [option] from the "Category" dropdown
5. In the "Message" text area, type: [text]
6. Check the "I agree to terms" checkbox
7. Take a screenshot of the filled form before submitting
8. Click the "Submit" button
9. Wait for the confirmation page to load

Data extraction:

markdown
1. Go to [URL]
2. Wait for the data table to load
3. Read all rows from the table
4. For each row, capture: [column A], [column B], [column C]
5. Report the data in a formatted list

Screenshot sequence:

markdown
1. Go to [URL]
2. Take a screenshot of the full page
3. Click on [section/tab]
4. Wait for the content to load
5. Take a screenshot
6. Repeat for [next section/tab]

Practical Examples

Each example is a complete instruction document you could give to an agent.

Example 1: Checking a Page for Changes

markdown
## Check pricing page for changes

Goal: Compare the current pricing on example.com with what I saw last time.

1. Go to https://example.com/pricing
2. Wait for the pricing cards to load
3. Take a screenshot of the full pricing page
4. Read the price listed for each plan (Starter, Pro, Enterprise)
5. Report the prices in this format:
   - Starter: $X/month
   - Pro: $X/month
   - Enterprise: $X/month
6. Note any differences from these previous values:
   - Starter was $9/month
   - Pro was $29/month
   - Enterprise was $99/month

Example 2: Filling Out a Form

markdown
## Submit weekly status report

1. Go to https://intranet.example.com/status-form
2. Log in if needed (username: me@company.com)
3. In the "Week of" field, enter this week's Monday date
4. In the "Completed" text area, enter:
   - Finished API documentation
   - Deployed v2.3 to staging
   - Resolved 3 support tickets
5. In the "Next Week" text area, enter:
   - Begin database migration
   - Team planning session
6. Select "On Track" from the "Project Status" dropdown
7. Take a screenshot of the completed form
8. Do NOT click Submit — I will review the screenshot first

Example 3: Extracting Data From a Table

markdown
## Extract conference speaker data

1. Go to https://conference.example.com/speakers
2. Wait for the speaker cards to load
3. For each speaker on the page, capture:
   - Name
   - Company
   - Talk title
   - Time slot
4. If there is a "Load More" or pagination button, click it and capture additional speakers
5. Continue until all speakers are captured
6. Report the complete list in a table format

Example 4: Multi-Step Workflow

markdown
## Download and summarize monthly report

1. Go to https://analytics.example.com
2. Log in with the saved credentials
3. Navigate to Reports > Monthly Summary
4. Select the previous month from the date picker
5. Click "Generate Report"
6. Wait for the report to finish generating (may take up to 30 seconds)
7. Take a screenshot of the summary section at the top
8. Read the following metrics from the report:
   - Total visits
   - Unique visitors
   - Top 5 pages by views
   - Conversion rate
9. Report these metrics back to me in a summary

Debugging Agent Automation

When the agent cannot complete a task, these techniques help.

When the Agent Cannot Find an Element

Symptom: The agent says it cannot find a button, link, or field.

Fix: Ask the agent to take a screenshot first, then describe what you see:

"Take a screenshot of the current page. I'll look at it and tell you where the button is."

After seeing the screenshot:

"The login button is in the top-right corner. It's a small text link that says 'Sign In', not a big button."

More specific descriptions help the agent find elements on retry.

When Pages Load Slowly

Symptom: The agent interacts with a page before it finishes loading, causing errors.

Fix: Add explicit wait instructions:

markdown
5. Click "Load Data"
6. Wait at least 5 seconds for the data table to appear
7. Take a screenshot to verify the table is visible before proceeding

When the Agent Gets Confused

Symptom: The agent takes a wrong action or navigates to the wrong page.

Fix: Add screenshot checkpoints after key steps:

markdown
3. Click "Reports" in the left sidebar
4. Take a screenshot — you should see a list of available reports
5. If you don't see the report list, click "Reports" in the top navigation bar instead

When Pages Change Layout

Symptom: Instructions that used to work stop working.

Fix: Write instructions that rely on text labels and visual descriptions rather than positions:

Instead of:

"Click the third item in the navigation bar"

Write:

"Click the link labeled 'Reports' in the navigation bar"

Text-based instructions survive layout changes better than position-based ones.

The Screenshot-First Strategy

When something is not working, always start with:

"Take a screenshot of what you currently see."

This tells you what the agent is seeing, which is often different from what you expect. Maybe the page did not load, maybe a popup is blocking the content, maybe you are on the wrong page entirely.


Ethics, Limits, and Responsible Automation

Agent Clicking = You Clicking

When an AI agent clicks a button on a website, it is the same as you clicking it. Everything that applies to manual browsing applies to automated browsing:

  • Terms of Service still apply. If a website prohibits automated access, using an agent to access it violates those terms just as much as a script would.
  • Account actions are real. If the agent clicks "Delete" or "Submit Order", that action happens for real.
  • You are responsible for everything the agent does on your behalf.

When NOT to Automate

Do not use browser automation for:

  • Sites that prohibit it — read the Terms of Service before automating
  • Bypassing access controls — do not automate around login walls for content you do not have access to
  • High-frequency scraping — hammering a website with rapid requests harms the service
  • CAPTCHA circumvention — CAPTCHAs exist to prevent automation; respect that
  • Anything that gives you an unfair advantage — ticket scalping, limited-stock buying bots, etc.

If a website offers an API or export feature, use that instead. Automation is a fallback for when no official method exists.

Rate Limiting Through Instructions

Be a good internet citizen. Add pauses to your instructions:

markdown
For each item in the list:
1. Click on the item
2. Read the details
3. Go back to the list
4. Wait 3-5 seconds before clicking the next item

Do not run the same automation more frequently than necessary. If checking once a day is enough, do not check every hour.

Personal Data Awareness

Be careful with automation involving personal information:

  • Never automate with sensitive data (banking, medical records) on shared or untrusted computers
  • Credentials: If your instructions include usernames or passwords, treat the instruction document as sensitive — do not commit it to public repositories
  • Scraped data may include personal information — handle it according to privacy regulations
  • Public data is generally safer to extract, but copyright still applies

The Ethical Test

Before automating any task, ask:

  1. Does the website allow this?
  2. Would I be comfortable if the website owner saw what I am doing?
  3. Am I being respectful of the website's resources?
  4. Am I handling any personal data responsibly?

If the answer to any of these is "no," reconsider.


Mini Project: Automate a Real Weekly Task

Goal

Pick a real task you do manually on the web every week. Write an instruction document for it. Run it through the agent. Refine until it works reliably. Save it for reuse.

Steps

1. Choose a task

Good candidates:

  • Checking a dashboard for specific numbers
  • Downloading a weekly report
  • Filling out a recurring form
  • Collecting information from a few pages
  • Taking screenshots for documentation

2. Write your instruction document

Create a Markdown file:

markdown
## [Task Name]

### Goal
[What this automation should accomplish]

### Steps
1. Go to [URL]
2. [Next action]
3. [Next action]
...

### Expected Result
[What success looks like]

### Notes
[Any credentials, timing considerations, or edge cases]

3. Run it through the agent

Give your instruction document to the agent and watch what happens. Note where it succeeds and where it struggles.

4. Refine

Based on what you observed:

  • Add more detail where the agent got confused
  • Add wait conditions where timing was an issue
  • Add screenshots at checkpoint steps
  • Simplify steps that were overly complicated

5. Save for reuse

Store your refined instruction document in your workspace (from Specialization 1):

reference/
└── automations/
    ├── weekly-report-download.md
    ├── status-form-submission.md
    └── pricing-check.md

You now have a library of reusable browser automations — no code required.


Key Takeaways

  • Agentic browser automation replaces scripts with plain-language instructions
  • The core skill is writing clear, specific instructions — not programming
  • Playwright MCP gives the agent browser control; you describe what should happen
  • Structure instructions as numbered steps with visual descriptions, wait conditions, and expected outcomes
  • Screenshot checkpoints are your best debugging tool
  • Agent clicking = you clicking — Terms of Service and ethics still apply
  • Start with one real task, refine the instructions, and build a reusable library

Next Steps

Want to explore AI automation for your business?

SFLOW helps Belgian companies implement practical AI solutions - from automated workflows to custom integrations with your existing systems.

Let's talk

SFLOW BV - Polderstraat 37, 2491 Balen-Olmen