open_deep_research
open_deep_research copied to clipboard
Is it possible to use the ArXiv API instead of perplexity?
In the "search_api" configuration is it possible to use ArXiv to retrieve scientific papers and use that information instead of a regular web search?
I've added EXA as a search API, and arXiv and PubMed as separate tools. It's super easy to integrate.
Here's the arXiv tool documentation:
https://python.langchain.com/docs/integrations/tools/arxiv/
Make sure that the response is formatted in the structure expected by deduplicate_and_format_sources.
Below is an example from my implementation:
configuration.py
...
class SearchAPI(Enum):
PERPLEXITY = "perplexity"
TAVILY = "tavily"
EXA = "exa"
...
graph.py
This logic appears in multiple places:
...
# Search the web
if search_api == "tavily":
search_results = await tavily_search_async(query_list)
source_str = deduplicate_and_format_sources(search_results, max_tokens_per_source=1000, include_raw_content=False)
elif search_api == "perplexity":
search_results = perplexity_search(query_list)
source_str = deduplicate_and_format_sources(search_results, max_tokens_per_source=1000, include_raw_content=False)
elif search_api == "exa":
search_results = await exa_search(query_list)
source_str = deduplicate_and_format_sources(search_results, max_tokens_per_source=1000, include_raw_content=False)
In your arxiv_search method, ensure that the returned structure matches the format expected by deduplicate_and_format_sources:
"""
...
Args:
search_queries (List[SearchQuery]): List of search queries to process
Returns:
List[dict]: List of search responses from the Perplexity API, one per query. Each response should have the format:
{
'query': str, # The original search query
'follow_up_questions': None,
'answer': None,
'images': list,
'results': [ # List of search results
{
'title': str, # Title of the search result
'url': str, # URL of the result
'content': str, # Summary/snippet of the content
'score': float, # Relevance score
'raw_content': str|None # Full content or None for secondary citations
},
...
]
}
...
"""
# Your search logic
Let me know if you need any help.
I've added EXA as a search API, and arXiv and PubMed as separate tools. It's super easy to integrate.
Yes, do you mind creating a PR? These are nice additions.
That's awesome! I'll try to replicate that. Thank you so much! @bartolli
I've added EXA as a search API, and arXiv and PubMed as separate tools. It's super easy to integrate.
Yes, do you mind creating a PR? These are nice additions. @rlancemartin Done. I'll add arXiv and PubMed in a separate PR
Thanks for Exa PR! Had minor comments. Let's also add arXiv and PubMed. Please include them in README.
Thanks for Exa PR! Had minor comments. Let's also add arXiv and PubMed. Please include them in README.
Running final tests for PubMed and arXiv, will commit the changes and update the README tonight.
@rlancemartin Added arXiv and PubMed APIs as requested! Both follow the same pattern as the other search implementations. Ready for review 👍