if i use claude or codex, i constantly am screen capping to show bad padding / alignment / whatever. is there a defacto way to let claude or codex tool call to see the browser or render a page for themselves?
100 views
Replies
Best
With latest update, cursor can open browser in IDE, but in my experience, better way is to set some kind of rules for agent, like design system, but for AI
@dm_23 interesting. seems like even with good design systems it would be helpful to visualize the dom, but maybe im off in terms of the actual value.
Report
@catt_marroll As far as I looked into, this was known best way. But it would be really great if there will be other options. Of course if we are not talking about test automations :D
Playwright MCP is pretty much the standard now. Two main approaches:
1. Accessibility snapshot (browser_snapshot) - Returns a structured tree of the page. Better for understanding layout hierarchy, finding
elements by role/label, and debugging DOM structure issues.
2. Screenshot (browser_take_screenshot) - Returns actual pixels. Use this for visual issues like padding, alignment, colors. You can
screenshot the full page or a specific element.
For padding/alignment specifically, I'd use screenshot since you need to see the actual rendered output. The snapshot won't show you pixel-level spacing.
Setup is straightforward - add the Playwright MCP server to your claude_desktop_config.json or .mcp.json:
Then Claude Code can navigate to URLs and take screenshots/snapshots directly in the conversation. Works with local dev servers too
Stop screen-capping: the best-in-class setup is giving the model a real browser tool, usually Playwright via MCP, so it can open the page and take its own screenshots (e.g., claude mcp add playwright npx @playwright/mcp@latest).
Hot take: if your “agent” can’t drive a browser, it’s not an agent, it’s a chat window you keep feeding JPEGs.
If you’re on Codex, the common loop is still “attach screenshots as image input” (e.g., codex -i screenshot.png ...), which works, but it’s not the same as first-class browser control.
Report
Hello Matt, Yes, there are basically two “defacto” approaches now, depending on whether you want visual eyes (screenshots) or structured inspection (DOM/DevTools).
1) Best for “bad padding/alignment”: Playwright MCP or Chrome DevTools MCP
Instead of you screen-capping, the model can:
open your page
inspect DOM/layout/accessibility tree
read computed styles / bounding boxes
take screenshots when needed
Codex explicitly supports MCP servers like Playwright and Chrome DevTools.
Caveat: in some Codex setups, you may get DOM snapshots/logs but no visible GUI window (still useful for layout debugging).
When to use this: UI regressions, spacing, CSS, responsive checks, basic E2E flows.
2) Best for “it should literally see what I see”: Computer Use (screenshot loop)
This is the “agent drives a browser” mode:
your code runs actions (click/type)
you return screenshots
model iterates like a human tester
OpenAI has a Computer Use tool for this.
Anthropic Claude also has a Computer Use tool (beta) that works similarly with screenshots + mouse/keyboard control.
When to use this: visual pixel issues, “does this look wrong?”, flows that require real rendering.
Practical “best-in-class” setup
Use Playwright/DevTools MCP for day-to-day UI debugging (fast, inspectable, automatable).
Fall back to Computer Use when you need true visual verification (pixel/padding/spacing “feel”).
If you tell me your stack (local dev server? deployed URL? Chrome-only? auth/login?), I can suggest the cleanest setup (MCP vs computer-use) for your exact workflow.
Report
For stuff like tweaking padding I’d suggest not looking for a separate browser tool but switching to Cursor instead - it has Claude under the hood, a built-in terminal, and the ability to index your CSS/inspections. You can just copy-paste the Computed styles from the DevTools console, and it’ll grasp the context way better than it would from a screenshot
Replies
With latest update, cursor can open browser in IDE, but in my experience, better way is to set some kind of rules for agent, like design system, but for AI
My Financé
@dm_23 interesting. seems like even with good design systems it would be helpful to visualize the dom, but maybe im off in terms of the actual value.
@catt_marroll As far as I looked into, this was known best way. But it would be really great if there will be other options. Of course if we are not talking about test automations :D
Ekamoira GSC MCP
@catt_marroll
Playwright MCP is pretty much the standard now. Two main approaches:
1. Accessibility snapshot (browser_snapshot) - Returns a structured tree of the page. Better for understanding layout hierarchy, finding
elements by role/label, and debugging DOM structure issues.
2. Screenshot (browser_take_screenshot) - Returns actual pixels. Use this for visual issues like padding, alignment, colors. You can
screenshot the full page or a specific element.
For padding/alignment specifically, I'd use screenshot since you need to see the actual rendered output. The snapshot won't show you pixel-level spacing.
Setup is straightforward - add the Playwright MCP server to your claude_desktop_config.json or .mcp.json:
Then Claude Code can navigate to URLs and take screenshots/snapshots directly in the conversation. Works with local dev servers too
You can use Claude CLI with chrome extension: https://code.claude.com/docs/en/chrome
Stop screen-capping: the best-in-class setup is giving the model a real browser tool, usually Playwright via MCP, so it can open the page and take its own screenshots (e.g., claude mcp add playwright npx @playwright/mcp@latest).
Hot take: if your “agent” can’t drive a browser, it’s not an agent, it’s a chat window you keep feeding JPEGs.
If you’re on Codex, the common loop is still “attach screenshots as image input” (e.g., codex -i screenshot.png ...), which works, but it’s not the same as first-class browser control.
Hello Matt, Yes, there are basically two “defacto” approaches now, depending on whether you want visual eyes (screenshots) or structured inspection (DOM/DevTools).
1) Best for “bad padding/alignment”: Playwright MCP or Chrome DevTools MCP
Instead of you screen-capping, the model can:
open your page
inspect DOM/layout/accessibility tree
read computed styles / bounding boxes
take screenshots when needed
Codex explicitly supports MCP servers like Playwright and Chrome DevTools.
Caveat: in some Codex setups, you may get DOM snapshots/logs but no visible GUI window (still useful for layout debugging).
When to use this: UI regressions, spacing, CSS, responsive checks, basic E2E flows.
2) Best for “it should literally see what I see”: Computer Use (screenshot loop)
This is the “agent drives a browser” mode:
your code runs actions (click/type)
you return screenshots
model iterates like a human tester
OpenAI has a Computer Use tool for this.
Anthropic Claude also has a Computer Use tool (beta) that works similarly with screenshots + mouse/keyboard control.
When to use this: visual pixel issues, “does this look wrong?”, flows that require real rendering.
Practical “best-in-class” setup
Use Playwright/DevTools MCP for day-to-day UI debugging (fast, inspectable, automatable).
Fall back to Computer Use when you need true visual verification (pixel/padding/spacing “feel”).
If you tell me your stack (local dev server? deployed URL? Chrome-only? auth/login?), I can suggest the cleanest setup (MCP vs computer-use) for your exact workflow.
For stuff like tweaking padding I’d suggest not looking for a separate browser tool but switching to Cursor instead - it has Claude under the hood, a built-in terminal, and the ability to index your CSS/inspections. You can just copy-paste the Computed styles from the DevTools console, and it’ll grasp the context way better than it would from a screenshot