paper-qa icon indicating copy to clipboard operation
paper-qa copied to clipboard

Incorrect LLM Used and API Environment Variable Warnings

Open jukti3742 opened this issue 1 year ago • 3 comments

When running the script, I encountered two main issues:

  1. The library defaults to using the GPT-4o model despite setting the LLM to gpt-4o-mini, leading to higher costs.
  2. I received warnings about missing environment variables for external APIs, for reasons unknown to me.

Code Snippet:

import os
from paperqa import Docs, Settings
.
.
.
doc_paths = get_absolute_paths(folder_path)
docs = Docs()

for doc in doc_paths:
    docs.add(doc)

settings = Settings()
settings.llm = "gpt-4o-mini"
settings.summary_llm = "gpt-4o-mini"
settings.prompts.qa = my_qa_prompt
# settings.answer.answer_max_sources = 3

answer = docs.query(
    question,
    settings=settings,
)

print(answer.formatted_answer)

Warnings:

SEMANTIC_SCHOLAR_API_KEY environment variable not set. Semantic Scholar API rate limits may apply.
CROSSREF_MAILTO environment variable not set. Crossref API rate limits may apply.
CROSSREF_API_KEY environment variable not set. Crossref API rate limits may apply.

jukti3742 avatar Oct 25 '24 13:10 jukti3742

Hey there, @jukti3742! I'm Dosu, a bot here to help you with bugs, questions, and becoming a contributor. Let's tackle this issue together while we wait for a human maintainer.

  1. LLM Defaulting Issue: The paperqa library defaults to using "gpt-4o-2024-08-06" as the LLM. If your script is still using this model despite setting settings.llm and settings.summary_llm to "gpt-4o-mini", ensure that these settings are correctly applied before the query method is called. Double-check that there are no other parts of your code or library defaults that might override these settings.

  2. Environment Variable Warnings: The warnings about missing environment variables indicate that the library is trying to access external APIs without the necessary credentials. For Semantic Scholar and Crossref, you need to set the SEMANTIC_SCHOLAR_API_KEY, CROSSREF_MAILTO, and CROSSREF_API_KEY environment variables to avoid rate limits. You can set these in your environment using:

    export SEMANTIC_SCHOLAR_API_KEY='your_api_key'
    export CROSSREF_MAILTO='your_email'
    export CROSSREF_API_KEY='your_api_key'
    

    These variables are crucial for accessing the APIs without hitting rate limits [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 25 '24 13:10 dosubot[bot]

Not helpful, since settings are correctly set to "gpt-4o-mini" before the query method is called. Please see the code snippet.

jukti3742 avatar Oct 25 '24 13:10 jukti3742

For the warnings, please read https://github.com/Future-House/paper-qa#installation

For the model selection issue:

I had the same problem as you, and realized that the documentation is not updated to reflect API changes.

The model settings are actually taken from the agent key of the Settings object, and not the llm or summary_llm as the docs would suggest. The fix is to also give your model name inside that key as follows:


from paperqa.agents.main import AgentSettings

settings=Settings(
  agent=AgentSettings(
    agent_llm="gpt-4o-mini", # your desired LLM
  ),
)

If this fixes your problem, please close the issue, I'll be opening a new issue for the stale docs.

mmtftr avatar Nov 03 '24 09:11 mmtftr