paper-qa icon indicating copy to clipboard operation
paper-qa copied to clipboard

Example using locally-hosted model is not working

Open nleguillarme opened this issue 1 year ago • 7 comments

I am trying to use paper-qa with a locally-hosted model. However, the provided example:

from paperqa import Settings, ask

local_llm_config = dict(
    model_list=[
        dict(
            model_name="my_llm_model",
            litellm_params=dict(
                model="my-llm-model",
                api_base="http://localhost:8080/v1",
                api_key="sk-no-key-required",
                temperature=0.1,
                frequency_penalty=1.5,
                max_tokens=512,
            ),
        )
    ]
)

answer = ask(
    "What manufacturing challenges are unique to bispecific antibodies?",
    settings=Settings(
        llm="my-llm-model",
        llm_config=local_llm_config,
        summary_llm="my-llm-model",
        summary_llm_config=local_llm_config,
    ),
)

raises the following exception:

litellm.exceptions.BadRequestError: litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=my-llm-model
 Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers

nleguillarme avatar Oct 08 '24 08:10 nleguillarme

Hello,

PaperQA' documantation is not very clear about this... I made much trials to understand what's up. You have to specify inference endpoint. If you use a llamafile server, you'll have to specify "openai/my-llm-model" as model name, with ollama "ollama/my-llm-model".

Example for llamafile hosted locally :

local_llm_config = dict(
    model_list=[
        dict(
            model_name=f"openai/my-llm-model",
            litellm_params=dict(
                model=f"openai/my-llm-model",
                api_base="http://localhost:8080/v1",
                api_key="sk-no-key-required",
                temperature=0.1,
                frequency_penalty=1.5,
                max_tokens=1024,
            ),
        )
    ]
)

However, you'll still have an API connexion error that seems due to embedding model... So I don't use 'ask' function and use doc.query instead as follows :

embedding_model = SparseEmbeddingModel(ndim=256)

docs = Docs()

for doc in tqdm(file_list):
    try:
        docs.add(str("./Papers/"+str(doc)),
                     citation="File " + doc, docname=doc,
                     settings=settings,
                     embedding_model=embedding_model)
    except Exception as e:
        # sometimes this happens if PDFs aren't downloaded or readable
        print("Could not read", doc, e)
        continue

answer = docs.query(
    "Your question.",
    settings=settings,
    embedding_model=embedding_model,
)

I guess a clear and complete documentation would be welcomed.

Hope it helps.

Best regards.

Snikch63200 avatar Oct 08 '24 09:10 Snikch63200

I think it's simply not implemented or merged into main branch. By searching "API_BASE" in the project, you can't find any relevant code. https://github.com/search?q=repo%3AFuture-House%2Fpaper-qa+API_BASE&type=code

thiner avatar Oct 08 '24 09:10 thiner

Hi @thiner, we use litellm and it handles that kind of config. It should be parsed and you can find more information here

whitead avatar Oct 09 '24 20:10 whitead

I see. But why runs a LiteLLM inside PQA? It's better delpoy the service independently, decouple the model variation from PQA itself. It's also common we have already had the LiteLLM running, configure the other LiteLLM instance seems redundant.

thiner avatar Oct 10 '24 06:10 thiner

Hi @Snikch63200, thank you for your help. Could you also share the content of the settings variable, please?

nleguillarme avatar Oct 17 '24 08:10 nleguillarme

Sure,

settings=Settings(
    llm=f"openai/my-llm-model",
    llm_config=local_llm_config,
    summary_llm=f"openai/my-llm-model",
    summary_llm_config=local_llm_config,
    index_directory="indexes",
    paper_directory="./Papers",
    )

Best regards.

NB : Cannot use local LLM embedding model, I use (by default) SparseEmbeddingModel that is not optimal because doesn't really understand meaning of the question and search by keywords.

Snikch63200 avatar Oct 17 '24 08:10 Snikch63200

We've added a new feature to use the local sentence transformers library, which may be an easier way than trying to get litellm configured correctly for the using local embeddings:

https://github.com/Future-House/paper-qa?tab=readme-ov-file#local-embedding-models-sentence-transformers

whitead avatar Oct 18 '24 23:10 whitead