paper-qa
paper-qa copied to clipboard
CLI functionality in Python module
I am loving this library! The README is very through showing the library functions on the CLI. For instance, it shows clearly how to name an index and save it using pqa -i nanomaterials index, then we can execute full-text search using pqa -i nanomaterials search thermoelectrics. But is not clear to me how to execute the same functionality with the Python module. Can I recreate this by doing something like:
from paperqa import Settings, ask, search_query
search_query(query="test query", index_name="pqa_index_3cd017b0934549ec2602837942e27025", index_directory="~/.pqa/indexes/", settings=Settings())
I would like some hints on how to name the index and how to execute a search over the index. I would really appreciate your help.
Hi @ivanhigueram! I'm here to help you with any questions or issues you have. Let's tackle this together!
To name an index and execute a search over the index using the Python module in the paperqa library, you can use the SearchIndex class. Here is an example of how to do this:
- Create and name the index:
from paperqa.agents.search import SearchIndex
from paperqa.settings import Settings
# Create a SearchIndex instance with a custom name
index_name = "nanomaterials"
settings = Settings() # Load or configure your settings as needed
search_index = SearchIndex(index_name=index_name, settings=settings)
# Initialize the directory for the index
await search_index.init_directory()
- Add documents to the index:
# Example document to add
index_doc = {
"file_location": "path/to/document",
"body": "This is the content of the document."
}
# Add the document to the index
await search_index.add_document(index_doc)
- Execute a search over the index:
# Perform a search query
query = "thermoelectrics"
results = await search_index.query(query, top_n=10)
# Print the search results
for result in results:
print(result)
This code snippet demonstrates how to name an index, add documents to it, and execute a search query using the paperqa library's Python module. The Settings class is used to configure the settings for the SearchIndex [1][2].
To continue talking to Dosu, mention @dosu.
This is good, but how can I use a previous ask object index to execute a search? that is the part that is not clear in the documentation.
In the example you provide, I'd have to add each document individually, but I just want to use the ones used in a previous query (i.e. using agent_query) and do everything in Python.
To use a previous ask object index to execute a search without adding each document individually, you can use the search command with the --index option set to answers. This allows you to search the index of previous answers.
Here is how you can do it:
paperqa search --index answers "your search query"
This command will search the index of previous answers for the specified query [1].
To continue talking to Dosu, mention @dosu.
I guess I can get that from the documentation, but again, this is using the CLI, I want to execute this search in Python.
To execute a search using a previous ask object index in Python, you can use the ask function with the Settings object to reference the existing index. Here is an example:
from paperqa import Settings, ask
# Define your query and settings
query = "What manufacturing challenges are unique to bispecific antibodies?"
settings = Settings(paper_directory="path/to/your/index_directory")
# Execute the search
answer = ask(query, settings=settings)
# Print the answer
print(answer)
In this example, the paper_directory parameter in the Settings object is set to the directory where your index is stored. This allows the ask function to reference the existing index and execute the search [1][2].
To continue talking to Dosu, mention @dosu.
hey @ivanhigueram -- you should be able to find all your indexes in the PQA_HOME directory (defaults to ~/.pqa/indexes/), you can see the names for each of them there. All answers (no matter from where they originated) will go into the answers index, which you can also search via the search_query function. In that case, it'll return Answer objects with relevant details for each answer.
If you have several paper indexes, and you're trying to work out which goes with your settings of interest, just make a new Settings object with the paper directory and settings that you previously used and then do: settings.get_index_name() -- it'll tell you the name of the index.
Hi @ivanhigueram, we just documented this: https://github.com/Future-House/paper-qa/tree/v5.0.5#reusing-index
If you have any other questions, feel free to open a new issue. Thanks for pointing out areas for improvement to us it's appreciated