Specialization 3: Agentic Browser Automation
Write instructions, not scripts — tell an AI agent what to do in the browser, and it does it.
INFO
- Time: ~3-4 hours
- Difficulty: Beginner-Intermediate
- Prerequisites: Core Modules completed
This Page Covers
- How agentic browser automation differs from traditional scripting
- How Playwright MCP gives an agent browser control
- Setting up Playwright MCP
- Writing effective agent instructions (the core skill)
- Practical examples as instruction documents
- Debugging when the agent gets stuck
- Ethics, limits, and responsible automation
The Shift: From Scripts to Instructions
The Old Way
Traditional browser automation meant learning a programming language. You would:
- Install Node.js and Playwright (or Selenium, or Puppeteer)
- Learn the JavaScript API for browser control
- Write code to find elements using CSS selectors
- Debug timing issues, broken selectors, and flaky tests
- Maintain scripts when websites change
This was powerful but had a high barrier to entry. Most people who could benefit from browser automation never got past step 2.
The New Way
With agentic browser automation, you write a structured document describing what you want to happen. The AI agent reads your instructions, controls the browser, and reports the results.
You do not write code. You do not learn CSS selectors. You do not debug JavaScript. Instead, you describe what you see on the screen and what should happen, and the agent figures out the implementation.
Example — the old way (JavaScript):
await page.goto('https://example.com/login');
await page.fill('input[name="email"]', 'user@example.com');
await page.fill('input[name="password"]', 'mypassword');
await page.click('button[type="submit"]');
await page.waitForURL('**/dashboard');Example — the new way (instructions):
1. Go to https://example.com/login
2. Enter "user@example.com" in the email field
3. Enter the password in the password field
4. Click the "Sign In" button
5. Wait until the dashboard page loads
6. Take a screenshot to confirmThe skill shifts from "knowing a programming API" to "writing clear instructions for an agent."
Honest Limitations
Agentic browser automation is not magic. It works well for:
- Straightforward page interactions (clicks, typing, navigation)
- Pages with clear, labeled UI elements
- Repetitive tasks with predictable steps
It struggles with:
- Complex dynamic pages (heavy JavaScript, infinite scrolls)
- Unusual UI patterns (custom drag-and-drop, canvas-based interfaces)
- CAPTCHA and bot-detection systems
- Tasks requiring pixel-perfect precision
When the agent gets confused, you can usually fix it by writing more specific instructions. We cover debugging techniques later.
How Agentic Browser Automation Works
Playwright MCP
Playwright is browser automation software created by Microsoft. Playwright MCP is an MCP server that gives an AI agent the ability to control a browser through Playwright — without you writing any Playwright code.
The flow:
- You write instructions describing what should happen
- The agent reads your instructions and interprets each step
- Playwright MCP translates the agent's actions into browser commands
- The browser executes the commands (click, type, navigate, screenshot)
- The agent observes the result and proceeds to the next step
What the Agent Can Do
Through Playwright MCP, the agent can:
- Navigate — go to URLs, click links, go back/forward
- Read — see page content, read text from elements
- Interact — click buttons, fill forms, select dropdowns, check boxes
- Wait — pause until elements appear or pages load
- Capture — take screenshots, extract text or data from tables
- Evaluate — run simple checks ("does the page contain this text?")
What the Agent Cannot Do
- Bypass CAPTCHAs or bot detection
- Interact with native OS dialogs (file pickers, print dialogs)
- Work with browser extensions
- Handle complex drag-and-drop or canvas-based UIs reliably
- Maintain sessions across separate agent conversations (unless you handle cookies)
Setting Up Playwright MCP
Prerequisites
- Node.js installed (nodejs.org — LTS version)
- Claude Desktop or Claude Code installed
Option A: Claude Code
Add Playwright MCP to your project by creating a .mcp.json file in your project root (or add to your user-level ~/.claude/settings.json):
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp"]
}
}
}After saving, restart Claude Code so it picks up the new MCP server.
Option B: Claude Desktop
Add Playwright MCP to your Claude Desktop configuration file:
On Mac (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp"]
}
}
}On Windows (%APPDATA%\Claude\claude_desktop_config.json):
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@playwright/mcp"]
}
}
}After saving, restart Claude Desktop completely (quit and reopen, not just close the window).
Verify It Works
Test with a simple instruction:
"Open https://example.com and take a screenshot."
If the agent opens a browser, navigates to the page, and returns a screenshot, your setup is working.
First run downloads a browser
The first time Playwright MCP runs, it will automatically download a Chromium browser (~150 MB). This is a one-time download — subsequent runs start immediately. If it seems to hang for a minute on the first launch, this is why.
Running Your First Automation
Setup is done. Before we get into writing longer instruction documents, let's see what the actual experience looks like — you type a plain-language instruction, and the agent does the rest.
A Concrete Example
Open Claude Desktop (or Claude Code) with Playwright MCP configured. Type the following into the chat:
Go to https://news.ycombinator.com. Read the top 5 headlines and tell me their titles and point counts.
That is it. No special syntax, no file to reference, no command to run. You type your request in plain language, just like you would ask a colleague.
Here is what the agent does behind the scenes:
- Launches a browser — Playwright MCP opens a Chromium window (you may see it flash open)
- Navigates to the page — the agent goes to the URL you specified
- Reads the page content — the agent inspects the page to find the headlines
- Extracts the data — it pulls out the titles and point counts
- Reports back — you get a structured response in the chat, like:
Here are the top 5 headlines from Hacker News:
- "Show HN: I built a tool for..." — 342 points
- "Why databases should..." — 289 points
- ...
The entire interaction happens in the chat window. The agent handles the browser; you just read the result.
What You Can Try Next
Once that first example works, try a few more single-instruction automations to build confidence:
- "Go to https://example.com and take a screenshot"
- "Open https://en.wikipedia.org/wiki/Main_Page and tell me what today's featured article is about"
- "Go to https://weather.gov, search for your zip code, and tell me tomorrow's forecast"
Each of these is a one-shot instruction. You type it, the agent does it, you see the result.
From Chat Instructions to Instruction Documents
For quick tasks, typing directly into the chat works well. But as your automations get more complex — multi-step workflows, login flows, branching logic — writing your instructions in a Markdown file first has real advantages:
- You can review and refine before running
- You can reuse the same instructions next week
- You can share them with colleagues
- You can version-control them in your workspace (from Specialization 1)
The next section teaches you how to write these structured instruction documents.
Writing Effective Agent Instructions
This is the core skill of agentic browser automation. The quality of your instructions directly determines whether the agent succeeds or fails.
Structure as Numbered Steps
Always write instructions as a clear numbered sequence:
## Check inventory on supplier website
1. Go to https://supplier.example.com
2. Click the "Login" button in the top right
3. Enter the username in the email field
4. Enter the password in the password field
5. Click "Sign In"
6. Navigate to the "Inventory" section using the left sidebar
7. Search for "Widget A" using the search box
8. Note the current stock number shown next to "Widget A"
9. Take a screenshot of the results
10. Report the stock number back to meBe Specific About Visual Elements
Do not reference code or technical selectors. Describe what you see on the screen:
Too vague:
"Click the button"
Good:
"Click the blue 'Submit' button at the bottom of the form"
Also good:
"Click the 'Sign In' link in the navigation bar at the top of the page"
The agent uses visual descriptions and text labels to find elements. Give it enough detail to identify the right one.
Include Wait Conditions
Web pages take time to load. Tell the agent when to wait:
5. Click "Generate Report"
6. Wait until the loading spinner disappears and the report table is visible
7. Take a screenshot of the reportWithout wait conditions, the agent might try to read or interact with content that has not loaded yet.
Include Branching Instructions
Real workflows have conditions. Handle them:
7. Look for the "In Stock" or "Out of Stock" label
8. If "In Stock": note the quantity and continue to step 9
9. If "Out of Stock": take a screenshot, report "out of stock", and stopDescribe Expected Outcomes
Tell the agent what success looks like:
10. After submission, you should see a confirmation message that says "Your order has been placed"
11. If you see an error message instead, take a screenshot and report the error textCommon Instruction Patterns
Login flow:
1. Go to [URL]
2. Enter [username] in the email/username field
3. Enter [password] in the password field
4. Click the "Sign In" / "Login" button
5. Wait until the dashboard or home page loads
6. Take a screenshot to confirm successful loginForm filling:
1. Go to [form URL]
2. Fill in the "Name" field with [value]
3. Fill in the "Email" field with [value]
4. Select [option] from the "Category" dropdown
5. In the "Message" text area, type: [text]
6. Check the "I agree to terms" checkbox
7. Take a screenshot of the filled form before submitting
8. Click the "Submit" button
9. Wait for the confirmation page to loadData extraction:
1. Go to [URL]
2. Wait for the data table to load
3. Read all rows from the table
4. For each row, capture: [column A], [column B], [column C]
5. Report the data in a formatted listScreenshot sequence:
1. Go to [URL]
2. Take a screenshot of the full page
3. Click on [section/tab]
4. Wait for the content to load
5. Take a screenshot
6. Repeat for [next section/tab]Practical Examples
Each example is a complete instruction document you could give to an agent.
Example 1: Checking a Page for Changes
## Check pricing page for changes
Goal: Compare the current pricing on example.com with what I saw last time.
1. Go to https://example.com/pricing
2. Wait for the pricing cards to load
3. Take a screenshot of the full pricing page
4. Read the price listed for each plan (Starter, Pro, Enterprise)
5. Report the prices in this format:
- Starter: $X/month
- Pro: $X/month
- Enterprise: $X/month
6. Note any differences from these previous values:
- Starter was $9/month
- Pro was $29/month
- Enterprise was $99/monthExample 2: Filling Out a Form
## Submit weekly status report
1. Go to https://intranet.example.com/status-form
2. Log in if needed (username: me@company.com)
3. In the "Week of" field, enter this week's Monday date
4. In the "Completed" text area, enter:
- Finished API documentation
- Deployed v2.3 to staging
- Resolved 3 support tickets
5. In the "Next Week" text area, enter:
- Begin database migration
- Team planning session
6. Select "On Track" from the "Project Status" dropdown
7. Take a screenshot of the completed form
8. Do NOT click Submit — I will review the screenshot firstExample 3: Extracting Data From a Table
## Extract conference speaker data
1. Go to https://conference.example.com/speakers
2. Wait for the speaker cards to load
3. For each speaker on the page, capture:
- Name
- Company
- Talk title
- Time slot
4. If there is a "Load More" or pagination button, click it and capture additional speakers
5. Continue until all speakers are captured
6. Report the complete list in a table formatExample 4: Multi-Step Workflow
## Download and summarize monthly report
1. Go to https://analytics.example.com
2. Log in with the saved credentials
3. Navigate to Reports > Monthly Summary
4. Select the previous month from the date picker
5. Click "Generate Report"
6. Wait for the report to finish generating (may take up to 30 seconds)
7. Take a screenshot of the summary section at the top
8. Read the following metrics from the report:
- Total visits
- Unique visitors
- Top 5 pages by views
- Conversion rate
9. Report these metrics back to me in a summaryDebugging Agent Automation
When the agent cannot complete a task, these techniques help.
When the Agent Cannot Find an Element
Symptom: The agent says it cannot find a button, link, or field.
Fix: Ask the agent to take a screenshot first, then describe what you see:
"Take a screenshot of the current page. I'll look at it and tell you where the button is."
After seeing the screenshot:
"The login button is in the top-right corner. It's a small text link that says 'Sign In', not a big button."
More specific descriptions help the agent find elements on retry.
When Pages Load Slowly
Symptom: The agent interacts with a page before it finishes loading, causing errors.
Fix: Add explicit wait instructions:
5. Click "Load Data"
6. Wait at least 5 seconds for the data table to appear
7. Take a screenshot to verify the table is visible before proceedingWhen the Agent Gets Confused
Symptom: The agent takes a wrong action or navigates to the wrong page.
Fix: Add screenshot checkpoints after key steps:
3. Click "Reports" in the left sidebar
4. Take a screenshot — you should see a list of available reports
5. If you don't see the report list, click "Reports" in the top navigation bar insteadWhen Pages Change Layout
Symptom: Instructions that used to work stop working.
Fix: Write instructions that rely on text labels and visual descriptions rather than positions:
Instead of:
"Click the third item in the navigation bar"
Write:
"Click the link labeled 'Reports' in the navigation bar"
Text-based instructions survive layout changes better than position-based ones.
The Screenshot-First Strategy
When something is not working, always start with:
"Take a screenshot of what you currently see."
This tells you what the agent is seeing, which is often different from what you expect. Maybe the page did not load, maybe a popup is blocking the content, maybe you are on the wrong page entirely.
Ethics, Limits, and Responsible Automation
Agent Clicking = You Clicking
When an AI agent clicks a button on a website, it is the same as you clicking it. Everything that applies to manual browsing applies to automated browsing:
- Terms of Service still apply. If a website prohibits automated access, using an agent to access it violates those terms just as much as a script would.
- Account actions are real. If the agent clicks "Delete" or "Submit Order", that action happens for real.
- You are responsible for everything the agent does on your behalf.
When NOT to Automate
Do not use browser automation for:
- Sites that prohibit it — read the Terms of Service before automating
- Bypassing access controls — do not automate around login walls for content you do not have access to
- High-frequency scraping — hammering a website with rapid requests harms the service
- CAPTCHA circumvention — CAPTCHAs exist to prevent automation; respect that
- Anything that gives you an unfair advantage — ticket scalping, limited-stock buying bots, etc.
If a website offers an API or export feature, use that instead. Automation is a fallback for when no official method exists.
Rate Limiting Through Instructions
Be a good internet citizen. Add pauses to your instructions:
For each item in the list:
1. Click on the item
2. Read the details
3. Go back to the list
4. Wait 3-5 seconds before clicking the next itemDo not run the same automation more frequently than necessary. If checking once a day is enough, do not check every hour.
Personal Data Awareness
Be careful with automation involving personal information:
- Never automate with sensitive data (banking, medical records) on shared or untrusted computers
- Credentials: If your instructions include usernames or passwords, treat the instruction document as sensitive — do not commit it to public repositories
- Scraped data may include personal information — handle it according to privacy regulations
- Public data is generally safer to extract, but copyright still applies
The Ethical Test
Before automating any task, ask:
- Does the website allow this?
- Would I be comfortable if the website owner saw what I am doing?
- Am I being respectful of the website's resources?
- Am I handling any personal data responsibly?
If the answer to any of these is "no," reconsider.
Mini Project: Automate a Real Weekly Task
Goal
Pick a real task you do manually on the web every week. Write an instruction document for it. Run it through the agent. Refine until it works reliably. Save it for reuse.
Steps
1. Choose a task
Good candidates:
- Checking a dashboard for specific numbers
- Downloading a weekly report
- Filling out a recurring form
- Collecting information from a few pages
- Taking screenshots for documentation
2. Write your instruction document
Create a Markdown file:
## [Task Name]
### Goal
[What this automation should accomplish]
### Steps
1. Go to [URL]
2. [Next action]
3. [Next action]
...
### Expected Result
[What success looks like]
### Notes
[Any credentials, timing considerations, or edge cases]3. Run it through the agent
Give your instruction document to the agent and watch what happens. Note where it succeeds and where it struggles.
4. Refine
Based on what you observed:
- Add more detail where the agent got confused
- Add wait conditions where timing was an issue
- Add screenshots at checkpoint steps
- Simplify steps that were overly complicated
5. Save for reuse
Store your refined instruction document in your workspace (from Specialization 1):
reference/
└── automations/
├── weekly-report-download.md
├── status-form-submission.md
└── pricing-check.mdYou now have a library of reusable browser automations — no code required.
Key Takeaways
- Agentic browser automation replaces scripts with plain-language instructions
- The core skill is writing clear, specific instructions — not programming
- Playwright MCP gives the agent browser control; you describe what should happen
- Structure instructions as numbered steps with visual descriptions, wait conditions, and expected outcomes
- Screenshot checkpoints are your best debugging tool
- Agent clicking = you clicking — Terms of Service and ethics still apply
- Start with one real task, refine the instructions, and build a reusable library
Next Steps
- Specialization 1: Your AI-Native Workspace — Set up the workspace where you store and manage your automation instruction documents
- Specialization 2: Connect AI to Your Systems (MCP) — Understand MCP in depth, including how Playwright MCP fits into the broader picture
