browser-use icon indicating copy to clipboard operation
browser-use copied to clipboard

Plug and Play Browser Drivers

Open yasithdev opened this issue 6 months ago • 1 comments

Abstracted playwright driver out from browser use code, adding support to implement and switch to non-playwright/patchwright browser drivers.


Summary by cubic

Abstracted browser driver logic from browser session code, enabling support for plug-and-play browser drivers beyond Playwright.

  • Refactors
    • Moved all browser driver interfaces to a new browser_use/typing.py file.
    • Added a new browser_use/drivers/playwright.py driver and updated code to use driver abstraction.
    • Updated imports and type hints across the codebase to use the new driver interfaces.
    • Updated examples and tests to use the new driver abstraction.

yasithdev avatar May 29 '25 23:05 yasithdev

Thanks for working on this!

TBH I'm still on the fence about this general direction though, I'm not convinced the added layer of complexity and indirection is worth it. It's a lot of new code, a lot of proxied function calls and APIs to keep up-to-date, an extra hop for pretty much every browser API. We need significant gains to justify it and currently I'm worried it will just slow down development speed and make debugging harder ("is xyz a playwright error or our playwright wrapper?..." etc).

There are not really many other viable drivers in town other than playwright and raw CDP. If anything we're likely to move down a level to raw CDP/webdriver-bidi and get rid of playwright-level drivers. I personally like puppeteer the best but they abandoned their python bindings a while back.

What driver do you actually plan on using this with other than playwright? If it's close enough to being playwright compatible that using it with browser-use is feasible, why not write a shim adapter in a separate library to make that driver playwright-compatible? Surely it would be used by people outside of browser-use too?

Can you give more insight into the end goals that this enables that aren't possible currently?

pirate avatar May 30 '25 01:05 pirate

@pirate totally fair concerns — this does add complexity, and you’re right to question if it’s worth the tradeoff.

From what I understand, browser-use currently runs server-side and is closely tied to Playwright. The goal here is to break that dependency so we’re not locked into Playwright or its API shape long-term. By introducing a core abstraction, we can support other drivers like raw CDP, WebDriver BiDi, or even custom ones like a socket-based transport.

This also lays the groundwork for other types of backends. For example, we could run browser-use as a remote service and have browser extensions or third-party tools connect over HTTP/3, WebSocket or socket.io — essentially to build day-to-day browser assistants for the general public, to use as a sidekick while they browse the web. This is something that’s hard to do cleanly with Playwright baked in at the core.

If this was just about patching in another Playwright-like driver (like Patchwright), a shim could work. But that keeps us in the Playwright world. What we want is to define a clean set of abstract functions at the browser-use level, so the core logic doesn’t care what driver is underneath. That gives us more freedom to evolve.

I totally get that this adds overhead, and yeah — it could make debugging harder. But the idea is to trade some complexity now for more flexibility and broader use cases down the line. That said, if there’s a leaner way to get there without the extra indirection, I’m definitely open to it.

yasithdev avatar Jun 11 '25 08:06 yasithdev

This also lays the groundwork for other types of backends. For example, we could run browser-use as a remote service and have browser extensions or third-party tools connect over HTTP/3, WebSocket or socket.io

this is already how playwright works, it runs a local node.js server that sends messages over a channel to the real backend that pipes them to a browser via webdriver bidi, CDP, or stdio pipe. we are also already building an event bus system that will allow any node to participate in an agent workflow, including browser extensions, other MCP servers, other browser drivers, etc. that each present "capabilities" aka actions they can execute. I think that new design is going to bring the kind of flexibility that this PR is after, but in a different style.

We're currently focused on the AI capability enablement and performance parts, and adding too much browser driver variability drastically slows down development speed. I don't think there are any other viable drivers right now anyway, so I don't want to add all this complexity "just in case" for the future, we should wait until there is actually a reason to need it. I have my fair share of complaints about playwright's API design and management, but I don't think switching away from it in the python ecosystem is currently realistic given our team priorities.

Thank you for your time investment in this work so far though.

pirate avatar Jun 14 '25 07:06 pirate

Thanks for the thoughtful feedback. Totally understand the focus on AI capability and performance right now, and agree that avoiding premature complexity makes sense. Glad to hear there’s already a broader architecture in motion that aligns with some of the ideas here. In case this comes up again, please feel free to reach out.

yasithdev avatar Jun 15 '25 19:06 yasithdev