Matt Carroll

what is the best in class way to let claude / codex etc view the browser?

if i use claude or codex, i constantly am screen capping to show bad padding / alignment / whatever. is there a defacto way to let claude or codex tool call to see the browser or render a page for themselves?

100 views

Add a comment

Replies

Best
Derenik Sargsyan

With latest update, cursor can open browser in IDE, but in my experience, better way is to set some kind of rules for agent, like design system, but for AI

Matt Carroll

@dm_23 interesting. seems like even with good design systems it would be helpful to visualize the dom, but maybe im off in terms of the actual value.

Derenik Sargsyan

@catt_marroll As far as I looked into, this was known best way. But it would be really great if there will be other options. Of course if we are not talking about test automations :D

Soumyadeep Mukherjee

@catt_marroll

Playwright MCP is pretty much the standard now. Two main approaches:                                                                     

                                                                                                                                           

1. Accessibility snapshot (browser_snapshot) - Returns a structured tree of the page. Better for understanding layout hierarchy, finding 

  elements by role/label, and debugging DOM structure issues.                                                                              

                                                                                                                                           

2. Screenshot (browser_take_screenshot) - Returns actual pixels. Use this for visual issues like padding, alignment, colors. You can     

  screenshot the full page or a specific element.                                                                                          

                                                                                                                                           

For padding/alignment specifically, I'd use screenshot since you need to see the actual rendered output. The snapshot won't show you  pixel-level spacing.                                                                                                                     

                                                                                                                                           

Setup is straightforward - add the Playwright MCP server to your claude_desktop_config.json or .mcp.json:                    

                                                                                                                                         

Then Claude Code can navigate to URLs and take screenshots/snapshots directly in the conversation. Works with local dev servers too 

David Sitbon

You can use Claude CLI with chrome extension: https://code.claude.com/docs/en/chrome

Vladimir Solovev

Stop screen-capping: the best-in-class setup is giving the model a real browser tool, usually Playwright via MCP, so it can open the page and take its own screenshots (e.g., claude mcp add playwright npx @playwright/mcp@latest).

Hot take: if your “agent” can’t drive a browser, it’s not an agent, it’s a chat window you keep feeding JPEGs.

If you’re on Codex, the common loop is still “attach screenshots as image input” (e.g., codex -i screenshot.png ...), which works, but it’s not the same as first-class browser control.

Alper Tayfur

Hello Matt, Yes, there are basically two “defacto” approaches now, depending on whether you want visual eyes (screenshots) or structured inspection (DOM/DevTools).

1) Best for “bad padding/alignment”: Playwright MCP or Chrome DevTools MCP

Instead of you screen-capping, the model can:

open your page

inspect DOM/layout/accessibility tree

read computed styles / bounding boxes

take screenshots when needed

Codex explicitly supports MCP servers like Playwright and Chrome DevTools.

Caveat: in some Codex setups, you may get DOM snapshots/logs but no visible GUI window (still useful for layout debugging).

When to use this: UI regressions, spacing, CSS, responsive checks, basic E2E flows.

2) Best for “it should literally see what I see”: Computer Use (screenshot loop)

This is the “agent drives a browser” mode:

your code runs actions (click/type)

you return screenshots

model iterates like a human tester

OpenAI has a Computer Use tool for this.

Anthropic Claude also has a Computer Use tool (beta) that works similarly with screenshots + mouse/keyboard control.

When to use this: visual pixel issues, “does this look wrong?”, flows that require real rendering.

Practical “best-in-class” setup

Use Playwright/DevTools MCP for day-to-day UI debugging (fast, inspectable, automatable).

Fall back to Computer Use when you need true visual verification (pixel/padding/spacing “feel”).

If you tell me your stack (local dev server? deployed URL? Chrome-only? auth/login?), I can suggest the cleanest setup (MCP vs computer-use) for your exact workflow.

Ян Петров

For stuff like tweaking padding I’d suggest not looking for a separate browser tool but switching to Cursor instead - it has Claude under the hood, a built-in terminal, and the ability to index your CSS/inspections. You can just copy-paste the Computed styles from the DevTools console, and it’ll grasp the context way better than it would from a screenshot