Add AI research agent using smolagents
This PR introduces a powerful, interactive Web Research Assistant built using Streamlit, integrating enhanced search tools, web scraping, AI agents, and markdown-based summarization. The tool is designed to provide recent, relevant, and readable information from the web, driven by the Hugging Face Qwen2.5-Coder-32B-Instruct model and DuckDuckGo search APIs.
🧠 Key Features:
Streamlit Interface: Intuitive, real-time search assistant with multi-tab results display (Results, Sources, Analysis, Logs).
- Enhanced Search Agent
- Webpage Content Extraction
- Agent Framework
- Logging & Debugging
- Export Capabilities
- Sidebar Settings
Install the dependencies
pip install streamlit pandas requests beautifulsoup4 markdownify python-dotenv smolagents huggingface_hub
Add your hugging face token
HF_TOKEN="hf_..."
Summary by CodeRabbit
- New Features
- Introduced a web-based research assistant that performs enhanced web searches, retrieves and summarizes content, and prioritizes recent sources through an interactive interface.
- Documentation
- Added a README with an overview, usage instructions, and a demo video link.
- Included an example environment file for setting up authentication tokens.
- Chores
- Added a requirements file listing necessary Python packages for the project.
Walkthrough
A new research assistant application is introduced using the Smol Agents framework. The update adds a Streamlit-based interface, agent logic for enhanced web search and webpage retrieval, supporting utilities, and documentation. Dependency management and environment variable configuration are provided via requirements and example environment files.
Changes
| File(s) | Change Summary |
|---|---|
| researchagent-smolagents/README.md | Added a README introducing the project, its purpose, usage, and demo link. |
| researchagent-smolagents/agents.py | Implemented the main Streamlit app, agent logic, search and webpage tools, logging utilities, and UI components. |
| researchagent-smolagents/requirements.txt | Added required Python dependencies for the project. |
| researchagent-smolagents/.env.example | Provided an example environment file with a placeholder for the Hugging Face token. |
Sequence Diagram(s)
sequenceDiagram
participant User
participant Streamlit UI
participant ManagerAgent
participant ToolCallingAgent
participant EnhancedSearchTool
participant VisitWebpageTool
User->>Streamlit UI: Enter search query and settings
Streamlit UI->>ManagerAgent: Run manager agent with query and parameters
ManagerAgent->>ToolCallingAgent: Delegate search task
ToolCallingAgent->>EnhancedSearchTool: Perform enhanced search
EnhancedSearchTool->>ToolCallingAgent: Return search results
ToolCallingAgent->>VisitWebpageTool: Fetch webpage content (as needed)
VisitWebpageTool->>ToolCallingAgent: Return webpage content
ToolCallingAgent->>ManagerAgent: Return aggregated results
ManagerAgent->>Streamlit UI: Provide formatted response
Streamlit UI->>User: Display results, sources, analysis, and logs
Poem
A rabbit hopped to Streamlit’s door,
With agents and tools, ready to explore.
It searched the web, fetched pages anew,
Summarized findings, all for you!
With tokens and logs, dependencies set,
This research bunny’s your best bet.
🐇✨
[!NOTE]
⚡️ AI Code Reviews for VS Code, Cursor, Windsurf
CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Learn more here.
[!NOTE]
⚡️ Faster reviews with caching
CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure
Review - Disable Cacheat either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off theData Retentionsetting under your Organization Settings. Enjoy the performance boost—your workflow just got faster.
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Explain this complex logic.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:@coderabbitai explain this code block.@coderabbitai modularize this function.
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read src/utils.ts and explain its main purpose.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.@coderabbitai help me debug CodeRabbit configuration file.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
@coderabbitai pauseto pause the reviews on a PR.@coderabbitai resumeto resume the paused reviews.@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository.@coderabbitai full reviewto do a full review from scratch and review all the files again.@coderabbitai summaryto regenerate the summary of the PR.@coderabbitai generate docstringsto generate docstrings for this PR.@coderabbitai generate sequence diagramto generate a sequence diagram of the changes in this PR.@coderabbitai resolveresolve all the CodeRabbit review comments.@coderabbitai configurationto show the current CodeRabbit configuration for the repository.@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
PR Review: Add AI research agent using smolagents
Thank you for this contribution! I've reviewed the PR which adds a new AI research agent using the smolagents framework. Here's my feedback:
Strengths
- Well-structured application: The Streamlit app is well-organized with clear separation of concerns between UI, search functionality, and agent orchestration.
- Comprehensive features: The implementation includes enhanced search, content extraction, recency scoring, and export capabilities.
- User-friendly interface: The multi-tab display (Results, Sources, Analysis, Logs) provides a good user experience.
- Error handling: The code includes proper error handling for web requests and agent operations.
Suggested Improvements
1. Security & Best Practices
- Environment Variables: As CodeRabbit suggested, rename
.envto.env.exampleand use a placeholder for the token. Add.envto.gitignoreto prevent accidental token commits. - Web Requests: Add timeouts and proper user-agent headers to the
visit_webpagefunction to prevent hanging on slow sites and avoid being blocked. - Unused Code: Remove the unused HTTP request in the
enhanced_searchfunction.
2. Dependencies
- Missing Requirements: Add
pandasandbeautifulsoup4torequirements.txtas they are imported but not listed. - Version Pinning: Consider pinning dependency versions for more deterministic builds, e.g.,
streamlit>=1.34,<2.0.
3. Documentation
- Demo Link: The demo video link in the README appears to be broken. Please update with a working link.
- Setup Instructions: Consider adding more detailed setup instructions in the README, including:
- How to create and activate a virtual environment
- How to run the application (
streamlit run agents.py) - Example queries to try
4. Performance Considerations
- Large Page Handling: Consider limiting download size for web pages (e.g., via
stream=Trueanditer_content) to handle very large pages gracefully. - Caching Strategy: The current caching implementation is good, but you might want to add TTL (time-to-live) for cached content.
Minor Issues
- There's a typo in the README title: "AI Reseach Agent" should be "AI Research Agent"
Overall, this is a valuable addition to the repository. With the suggested improvements, it will be even more robust and user-friendly. Great work!