fuji-web icon indicating copy to clipboard operation
fuji-web copied to clipboard

Expose API for easy integration with browser automation frameworks

Open mondaychen opened this issue 1 year ago • 5 comments

blocking #140

We plan to enhance WebWand’s programmability by exposing a JS API. This would facilitate integration with browser automation frameworks like Puppeteer, Playwright, and Selenium.

Providing such an API enables:

  • Automatically benchmarking the agent's performance across various tasks and scenarios.
  • Leveraging WebWand as a sub-agent of a more complex agentic system, such as a fully autonomous agent that has long-term memory and can solve complex tasks
  • Linking WebWand tasks to signals (e.g., every day at 8 a.m., whenever you receive an email, etc.).
  • Using WebWand as a service on the cloud

TODO

  • [ ] Establish and test messaging connection #157

mondaychen avatar Apr 25 '24 20:04 mondaychen

list of APIs:

  • update settings (all options in settings menu)
  • start a task (task description, options)
  • annotate web page
  • get screenshot (clean page, annotated page, combined)
  • get data for interactive elements
  • set up listener for action history updates
  • call individual tools (click, setValue, scroll, go to url, etc)

mondaychen avatar Apr 25 '24 20:04 mondaychen

For the initial design of exposing API, we will expose APIs that are necessary to run benchmarking test. Once we raise more attention from the public, we will expand this API to be more general and more specific.

lingjiefeng avatar May 01 '24 17:05 lingjiefeng

This task is more difficult than expected, so change the ticket size to Large. Here're the action items:

  • [ ] Establish message communication between webwand and external parties. Add necessary message listeners in WebWand to listen for messages sent from outside (such as the Python script used to run the benchmarking test) and send back "heard" message to the external part.
  • [ ] Implement specific logic in each message listener to execute the desired action.

More specifically (will keep updating this list):

  • [ ] add listener to "open-side-panel" in the background script of webwand

lingjiefeng avatar May 02 '24 20:05 lingjiefeng

This task is in parallel with #140

lingjiefeng avatar May 03 '24 15:05 lingjiefeng

The following content is to log what I've tried, which was not suitable in the context of webwand.

  1. design a http-based server. When we think about APIs, often the first thing that comes to mind is a web API, which you would interact with via HTTP requests (like POST /settings/set-api-key). These are typically used for server-client communication over a network. However, APIs can also be local, meaning they are libraries or frameworks used directly within the same software environment or system.
  2. design api as standalone "export" functions. This is not feasible because chrome extensions operate in an isolated environment. They have a separate scope from the regular web pages and other extensions/scripts running in the browser. Functions defined within a chrome extension (including those marked with export) are not directly accessible to external scripts (including Python or even other JavaScript running outside the extension).
  3. (as suggested by Mengdi) One way we can really "expose" webwand api to an external script is using message listener.

lingjiefeng avatar May 03 '24 16:05 lingjiefeng