Add headless browser to the WebSurferAgent
Why are these changes needed?
This adds a simple selenium driven headless browser to the WebSurferAgent. It does not yet make this headless browser agent multi-modal (e.g., it can't do anything with images in the browser) but this should work much better for javascript powered websites than the current SimpleTextBrowser implementation.
This also refactors the existing SImpleTextBrowser implementation a bit to have both share a common base class, and adds unit tests across both the existing and new HeadlessChromeBrowser implementation.
https://app.codecov.io/github/microsoft/autogen/pull/1534 shows the coverage additions there
Related issue number
Closes #1481
Checks
- [ ] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [x] I've made sure all auto checks have passed.
Codecov Report
Attention: Patch coverage is 88.11189% with 17 lines in your changes missing coverage. Please review.
Project coverage is 48.00%. Comparing base (
26daa18) to head (b0ab6c1). Report is 734 commits behind head on 0.2.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| autogen/browser_utils/headless_chrome_browser.py | 90.42% | 7 Missing and 2 partials :warning: |
| autogen/browser_utils/abstract_browser.py | 73.33% | 8 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## 0.2 #1534 +/- ##
===========================================
+ Coverage 34.26% 48.00% +13.73%
===========================================
Files 42 45 +3
Lines 5099 5225 +126
Branches 1165 1261 +96
===========================================
+ Hits 1747 2508 +761
+ Misses 3209 2513 -696
- Partials 143 204 +61
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 47.94% <88.11%> (+13.68%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
First of all, this looks fantastic. Thanks so much for working on this.
I will dig in and try this out as soon as possible today. We've discussed some possibilities on Discord already, but those are arguably future improvements. I think driving the browser, and using innerHTML are perfect first steps, and are completely sufficient for a first PR.
One thing to test will be PDFs. In many benchmark scenarios, PDFs are the final document sought by browsing, but they don't have innerHTML. As a first step, simply downloading them to the Downloads folder might be sufficient.
Just adding some notes while I continue to test this:
To install chrome in Docker or WSL:
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb sudo dpkg -i google-chrome-stable_current_amd64.deb
Resolve dependencies with: sudo apt -f install
Then try this again: sudo dpkg -i google-chrome-stable_current_amd64.deb
️✅ There are no secrets present in this pull request anymore.
If these secrets were true positive and are still valid, we highly recommend you to revoke them. Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately. Find here more information about risks.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
This PR is against AutoGen 0.2. AutoGen 0.2 has been moved to the 0.2 branch. Please rebase your PR on the 0.2 branch or update it to work with the new AutoGen 0.4 that is now in main.
@vijaykramesh closing as stale, please update addressing reviews if you would like to reopen