OpenClaw × Playwright CLI: Three-Stage AI Browser Automation with Zero Tokens at Runtime

Table of Contents

The same login flow, the same form fills, the same data scraping — you’ve done these workflows before, yet every run still burns tokens on AI inference.

OpenClaw’s approach is direct: let the AI learn once, distill it into a Skill file, then run token-free forever.

TL;DR

Playwright CLI (not MCP) paired with OpenClaw Skills is one of the most token-efficient AI browser automation setups available
Three-stage workflow: AI exploration (~41% tokens) → Skill distillation (~5% tokens) → zero-token execution
A Skill is a Markdown file describing browser steps; once created, subsequent runs consume zero inference tokens
The ClawHub registry has thousands of community Skills ready to install

Why Not Playwright MCP?

Connecting Playwright MCP directly to an AI agent works, but every step requires live model inference — high token cost, high latency.

Playwright CLI is a browser control interface purpose-built for AI agents:

Wraps browser operations (navigate, click, fill, screenshot, tab management) as structured CLI commands
Roughly 4x lower token consumption compared to Playwright MCP solutions
Outputs AI-readable plain text summaries rather than raw DOM trees

The efficiency gap comes from a key design decision: MCP solutions have the model re-reason between every single step; Playwright CLI serializes browser state into compact, AI-friendly snapshots.

The Three-Stage Workflow

The core insight: concentrate AI inference costs in a one-time learning phase, not every execution.

Stage 1: AI Exploration (~41% tokens)

Have the AI agent operate the target site once using Playwright CLI:

# Install the Playwright CLI Skill
clawhub install playwright-cli

# Run an exploratory session
openclaw "Log into the target site and retrieve today's notifications"

The AI reasons in real time — discovering UI structure, finding correct selectors, handling dynamic content. Token cost is highest here, but this happens only once.

Stage 2: Skill Distillation (~5% tokens)

After exploration, encode the workflow into a Skill file (plain Markdown):

# skill: login-and-fetch-notifications
## Description
Log into the site and retrieve the latest notifications

## Prerequisites
- playwright-cli installed

## Steps
1. playwright-cli navigate {{TARGET_URL}}
2. playwright-cli fill [name="username"] {{USERNAME}}
3. playwright-cli fill [name="password"] {{PASSWORD}}
4. playwright-cli click button[type="submit"]
5. playwright-cli wait .notification-list
6. playwright-cli snapshot .notification-list

The Skill is the “operations playbook” — every step is explicit, no AI reasoning required to follow it.

Stage 3: Zero-Token Execution

With the Skill in place, every subsequent run is just:

clawhub run login-and-fetch-notifications \
  --TARGET_URL=https://example.com \
  --USERNAME=me@example.com \
  --PASSWORD=***

No model inference at runtime. Playwright CLI follows the Skill script directly. Run it three times, a hundred times, or on a cron schedule — inference token cost stays at zero.

Skill Design Principles

OpenClaw Skills have a few properties worth noting:

Parameterized: {{VARIABLE}} placeholders inject runtime values, so one Skill covers multiple accounts or target URLs.

Composable: Skills can invoke other Skills, enabling compound workflows:

## Steps
1. skill: login-and-fetch-notifications  # calls another Skill
2. playwright-cli click .mark-all-read
3. playwright-cli screenshot cleared.png

Shareable: Publish to ClawHub for the community to install:

clawhub search "form automation"
clawhub install openclaw/playwright-cli
clawhub publish ./my-skill/

When to Use (and When Not To)

Good fit:

Scenario	Notes
Recurring data extraction	Daily price scraping, social metrics
Account operation automation	Bulk logins, form fills, notification handling
Test flow recording	Lock down QA manual paths into repeatable Skills
Cross-platform data sync	Move data between platforms with no API

Poor fit:

Tasks requiring real-time decision-making based on page content
Sites with frequently changing UI structure
Flows blocked by CAPTCHA or interactive verification

Quick Start

npm install -g openclaw
npx clawhub install playwright-cli
npx playwright install chromium

# First exploratory run (spends tokens — but only once)
openclaw "Open GitHub and screenshot my open pull requests"

References

🇺🇸 English

Here's the script:

---

Every time you run a browser automation workflow — logging in, filling forms, scraping data — you're burning AI inference tokens. Same workflow, same steps, same cost, every single run. OpenClaw asks a simple question: what if the AI only had to learn it once?

That's the entire premise. Let the AI figure out the workflow once, distill it into a reusable file, then run it forever with zero tokens at runtime.

Let's walk through how that actually works.

The tool pairing here is OpenClaw with Playwright CLI — and it's worth noting this is Playwright CLI, not Playwright MCP. The difference matters. When you connect Playwright MCP directly to an AI agent, the model has to re-reason between every single browser step. That's high token cost, high latency, every time. Playwright CLI instead serializes browser state into compact, AI-friendly snapshots — structured summaries rather than raw DOM dumps. The result is roughly four times lower token consumption compared to MCP-based approaches.

Now, the workflow has three stages, and understanding the token distribution across those stages is the key insight.

Stage one is AI Exploration, which accounts for about 41% of your total token spend. You run the AI agent against the target site once. It navigates in real time, discovers the UI structure, finds the right selectors, handles dynamic content, figures everything out. Token cost is highest here — but this only happens once. Ever.

Stage two is Skill Distillation, about 5% of tokens. Once the AI has figured out the workflow, you encode it into what OpenClaw calls a Skill file. This is just a Markdown file. It describes each browser step explicitly: navigate to this URL, fill this field with this value, click this button, wait for this element to appear, take a snapshot. No AI reasoning required to follow it — it's a plain operations playbook.

Stage three is where things get interesting: zero-token execution. With the Skill file in place, every subsequent run just executes those steps directly. Playwright CLI follows the script. No model inference, no token spend. Run it three times, a hundred times, schedule it on a cron job — the inference cost stays at zero.

Skills have a few design properties worth calling out. They're parameterized, meaning you use placeholder variables for things like URLs, usernames, and passwords — so one Skill can cover multiple accounts or environments without rewriting anything. They're also composable: a Skill can call other Skills, so you can build compound workflows out of smaller, reusable pieces. And they're shareable — there's a registry called ClawHub with thousands of community Skills you can install directly.

So where does this actually shine? Recurring data extraction is the obvious one — daily price scraping, social metrics, anything you'd otherwise run on a schedule. Account operations at scale: bulk logins, form fills, notification handling. Recording QA test flows so manual paths become repeatable automated checks. And cross-platform data sync when there's no API available.

Where it breaks down: anything requiring real-time decision-making based on what's actually on the page. Sites that change their UI frequently enough that your selectors go stale. And flows blocked by CAPTCHA or interactive verification — those are walls Skills can't climb.

The mental model to take away is this: AI inference is expensive at runtime but cheap as a one-time investment. OpenClaw is essentially amortizing that cost across every future run. You pay once for the AI to learn, then you own the workflow.

Three things to remember. First, Playwright CLI's compact browser snapshots are why this is four times more efficient than MCP approaches — the model doesn't re-reason every step. Second, a Skill file is just Markdown, which means it's readable, editable, and version-controllable by any engineer on your team. And third, the ClawHub registry means you probably don't have to write many Skills from scratch — search first, install, adapt if needed.

---

🇹🇼 中文

你有沒有遇過這種情況——同樣的登入流程、同樣的表單、同樣的資料爬取，明明已經做過了，但每次執行還是得讓 AI 從頭推理一遍，Token 一直在燒？

OpenClaw 對這個問題有一個很乾脆的答案：讓 AI 學一次，把過程蒸餾成 Skill 檔案，之後執行完全不碰模型推理。

---

先講一個很多人會有的疑問：為什麼不直接用 Playwright MCP？

Playwright MCP 可以讓 AI Agent 即時控制瀏覽器，這個方案沒有問題，但它有一個根本的成本結構——每個操作步驟都需要模型介入推理，Token 消耗高，延遲也大。

OpenClaw 用的是 Playwright CLI，一個專為 AI Agent 設計的輕量介面。它把瀏覽器操作包裝成結構化的指令，輸出的是 AI 可解析的純文字摘要，而不是整棵 DOM 樹。這個設計差異讓它比 MCP 方案省下大約四倍的 Token。

---

核心架構是三段式工作流，設計邏輯很清楚：把 AI 推理的成本集中在一次性學習，而不是每次執行都重新付費。

**第一段是 AI 探索**，大概消耗總 Token 的 41%。你讓 AI 搭配 Playwright CLI 實際操作目標網站一次——找到正確的元素選擇器、處理動態載入、搞清楚整個 UI 結構。這段需要推理，Token 燒得比較多，但只做一次。

**第二段是蒸餾**，大概只用掉 5%。把剛才的探索過程整理成一份 Skill 檔案，格式是普通的 Markdown。它描述的是精確的操作步驟——導航到哪個 URL、填哪個欄位、點哪個按鈕、等哪個元素出現。這份檔案人類可以直接讀、直接改，不是什麼神秘的 DSL。

**第三段是執行**，Token 消耗是零。Skill 建立之後，後續每次執行直接呼叫 CLI 按劇本跑，完全不觸碰模型推理。不管你跑三次還是一百次，不管你設成排程每天自動跑，Token 成本永遠是零。

---

Skill 的設計有幾個細節值得一提。

首先是參數化。步驟裡可以用雙大括號標記變數，比如目標 URL、帳號密碼，執行時動態注入。同一份 Skill 可以對應不同帳號、不同環境，不用複製好幾份。

再來是可組合。Skill 可以呼叫其他 Skill，形成巢狀的自動化流程。你可以把登入封裝成一個 Skill，把後續操作封裝成另一個，然後組合起來用。

最後是 ClawHub 社群。Skill 可以發布、安裝、搜尋，就像 npm 套件一樣。別人已經做好的自動化流程，你可以直接安裝拿來用，不用每次從頭探索。

---

當然這個方案不是萬能的。它最適合的場景是結構固定、重複執行的任務——定期抓競品資料、批次帳號操作、把 QA 手工測試路徑固化下來。如果頁面 UI 經常大幅改版，或者任務需要根據頁面內容即時做決策，Skill 就很容易失效，還是得回到 AI 即時推理的路線。

---

總結三個核心要點：

第一，Playwright CLI 比 MCP 方案省約四倍 Token，關鍵在於它不讓模型在每個步驟之間反覆推理。

第二，三段式工作流的本質是把推理成本一次性支付，之後執行永遠免費。

第三，Skill 是純 Markdown 格式，可讀、可改、可分享、可組合——這讓「操作知識」能像程式碼一樣被管理，而不是消耗在一次次的 prompt 歷史裡。

← Previous Boot to Shutdown: Every Operating System Concept from Fireship's 15-Minute Video

Next → Robot Data Collection Factories: Why Training Data Is the Real Bottleneck

Building a Video Production AI Agent with LangGraph: Lesson 3

Build a video production AI Agent with LangGraph that handles research, scripting, and storyboarding — the key is state machine design and conditional edges for error handling.

#ai-agent #langgraph #python #llm #workflow #automation

tech

May 20, 2026

How AI Agents Work, and What Is Harness Engineering?

AI Agents let models perceive environments and act autonomously. Harness Engineering is the discipline that makes them reliable — the scaffolding that turns a smart-but-unpredictable model into a deployable engineering system.

#ai-agent #harness-engineering #llm #system-design #ai-engineering