autogen
autogen copied to clipboard
WebSurfer Updated (Selenium, Playwright, and support for many filetypes)
Why are these changes needed?
This PR add Selenium and Playwright variants of the Markdown Web Browser used by WebSurfer. It also adds support for many additional content-types, and support for alternate search engines.
All MarkdownBrowser variants work via the following principle: 1. Fetch a page, 2. Convert it to markdown, 3. Operate on the Markdown
Such browsers are simple, and suitable for read-only agentic use -- they cannot be used to interact with complex web applications. Nevertheless, they are a great stopgap, and super useful when browsing local files (file:///user/afourney/repos/autogen) etc. because they can handle many different file formats (Office docs, PDFs, etc.), provide a common interface for Q&A, summarization, passage extraction etc.
Instructions
When installing AutoGen, use the [websurfer] optional dependencies.
If using Selenium, you must also pip install selenium
If using Playwright you must both pip install playwright and playwright install --with-deps chromium
Related issue number
#1481, #1534, #1733, #1832
Codecov Report
Attention: Patch coverage is 60.62133% with 469 lines in your changes are missing coverage. Please review.
Project coverage is 50.75%. Comparing base (
c3193f8) to head (6ba05c9).
Additional details and impacted files
@@ Coverage Diff @@
## main #1929 +/- ##
===========================================
+ Coverage 37.94% 50.75% +12.80%
===========================================
Files 77 83 +6
Lines 7784 8776 +992
Branches 1667 2040 +373
===========================================
+ Hits 2954 4454 +1500
+ Misses 4580 3946 -634
- Partials 250 376 +126
| Flag | Coverage Δ | |
|---|---|---|
| unittest | 12.75% <0.08%> (?) |
|
| unittests | 49.80% <60.62%> (+11.86%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@signalprime @vijaykramesh @INF800
With this PR, I tried to combine your Selenium browser PRs together in one place. Even if it doesn't show in the commit history, I used and learned a lot from each of your contributions, and welcome your further comments and contributions here. Once this is ready, the final PR will credit each of you, and we can perhaps co-author a Blog post.
Further, I believe @INF800 and @vijaykramesh 's PRs used Selenium to call Bing search -- which is clever in that it simplifies requirements to get up and running (you don't need to register for an API key). However, I opted to leave this out in favor of the API because it is a better fit for our automated use. Bing actively discourages scraping, and supporting that approach long term would involve actively evading bot detection. I am open to adding further modularity and configurability to add other search engines, perhaps DuckDuckGo, ArXiv etc. that don't require an API key.
This is great! We appreciate the credits and would love to co-author a blog post about it. A few weeks back I'd worked towards building DuckDuckGo search as an ability/skill that could be attached to agents as needed. I'll need to review the latest project path to ensure I'm adhering to future agreed-upon path and am ultimately encouraged to assist where else I may be useful to the project. Thanks @afourney and nice work!
@signalprime DuckDuckGo would make a great addition and would be a good check on if the search mechanism is as easy to extend as I hope.
i'm really excited about this one folks, looks like there's a lot to do yet, for youtube and more
@afourney following up on our previous discussion, I'm curious about your plans for making the audio transcription logic in mdconvert.py reusable. Given that I'm currently working on audio capabilities for agents https://github.com/microsoft/autogen/pull/2098, do you have any thoughts on how we could develop a shared audio module?
Thanks for the feedback @davorrunje Super helpful! Will address issues in subsequent commits, asap.
Thanks for the feedback @davorrunje Super helpful! Will address issues in subsequent commits, asap.
@afourney these are minor details. Great work, I am looking forward to testing it in production!
⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.
Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.
🔎 Detected hardcoded secrets in your pull request
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 10404662 | Triggered | Generic CLI Secret | 802f099588bedf1d022b2bba5fb534635df8e6f1 | .github/workflows/dotnet-release.yml | View secret |
| 10404662 | Triggered | Generic CLI Secret | 8a6ebe1cf8749fd9c501fe0949da824c8262fc84 | .github/workflows/dotnet-release.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
@afourney looks like the last thing to fix is the Git LFS. Can you install git lfs?
@afourney before you merge, please check https://github.com/microsoft/autogen/pull/1929/files#r1574300565 it should support custom bing search api url