Expose API for easy integration with browser automation frameworks
blocking #140
We plan to enhance WebWand’s programmability by exposing a JS API. This would facilitate integration with browser automation frameworks like Puppeteer, Playwright, and Selenium.
Providing such an API enables:
- Automatically benchmarking the agent's performance across various tasks and scenarios.
- Leveraging WebWand as a sub-agent of a more complex agentic system, such as a fully autonomous agent that has long-term memory and can solve complex tasks
- Linking WebWand tasks to signals (e.g., every day at 8 a.m., whenever you receive an email, etc.).
- Using WebWand as a service on the cloud
TODO
- [ ] Establish and test messaging connection #157
list of APIs:
- update settings (all options in settings menu)
- start a task (task description, options)
- annotate web page
- get screenshot (clean page, annotated page, combined)
- get data for interactive elements
- set up listener for action history updates
- call individual tools (click, setValue, scroll, go to url, etc)
For the initial design of exposing API, we will expose APIs that are necessary to run benchmarking test. Once we raise more attention from the public, we will expand this API to be more general and more specific.
This task is more difficult than expected, so change the ticket size to Large. Here're the action items:
- [ ] Establish message communication between webwand and external parties. Add necessary message listeners in WebWand to listen for messages sent from outside (such as the Python script used to run the benchmarking test) and send back "heard" message to the external part.
- [ ] Implement specific logic in each message listener to execute the desired action.
More specifically (will keep updating this list):
- [ ] add listener to "open-side-panel" in the background script of webwand
This task is in parallel with #140
The following content is to log what I've tried, which was not suitable in the context of webwand.
- design a http-based server. When we think about APIs, often the first thing that comes to mind is a web API, which you would interact with via HTTP requests (like POST /settings/set-api-key). These are typically used for server-client communication over a network. However, APIs can also be local, meaning they are libraries or frameworks used directly within the same software environment or system.
- design api as standalone "export" functions. This is not feasible because chrome extensions operate in an isolated environment. They have a separate scope from the regular web pages and other extensions/scripts running in the browser. Functions defined within a chrome extension (including those marked with export) are not directly accessible to external scripts (including Python or even other JavaScript running outside the extension).
- (as suggested by Mengdi) One way we can really "expose" webwand api to an external script is using message listener.