paper-qa icon indicating copy to clipboard operation
paper-qa copied to clipboard

Could you kindly consider unifying the writing styles for the usage of settings?

Open fiyen opened this issue 1 year ago • 3 comments

This is an excellent project, but the manual setting of APIs seems overly complicated. The same settings do not produce the same effect across different commands, which makes it very difficult for beginners. For example, the llm settings in doc.query can be written as:

llm_config = dict(
    model_list=[
        dict(
            model_name='gpt-4o-mini',
            litellm_params=dict(
                model='gpt-4o-mini',
                api_key="xxx",
                api_base="https://"
            )
        ),
    ],
)

However, the embedding settings must be written like this to be effective:

embedding_config = dict(
    model_list=[
        dict(
            model_name='text-embedding-3-small',
            litellm_params=dict(
                model='text-embedding-3-small',
                api_key="xxx",
                api_base="xxx"
            )
        )
    ],
    api_key="xxx",
    api_base="https://xxxx"
)

This is because when the embedding requests api_key and api_base, it does not use the parameters inside model_list. Even more frustrating is that when using doc.query, the settings can be written like this:

settings = Settings(
    llm='gpt-4o-mini',
    summary_llm='gpt-4o-mini',
    llm_config=llm_config,
    summary_llm_config=llm_config,
    embedding='text-embedding-3-small',
    embedding_config=embedding_config
)

However, the same writing style does not work in ask because the way api_key and api_base are indexed is different. Admittedly, this is due to the different usage modes of the tools, but having different settings in the same project can cause a lot of confusion. It would be great to unify the writing style of the settings.

fiyen avatar Sep 27 '24 02:09 fiyen

If the value of api related params are not given in some modules, these values may be set to the global_xx values.

fiyen avatar Sep 27 '24 02:09 fiyen

The problem I mentioned in ask may be caused by function litellm_get_search_query in helpers.py. Though query contains all the information about the settings, they are all dropped in this function, and only contains 'llm' and 'temperature' params. However, in this function, information about api_key and api_base are in fact still needed in model.run_prompt. I add an extra param in this function: litellm_params, the function is like:

async def litellm_get_search_query(
    question: str,
    count: int,
    template: str | None = None,
    llm: str = "gpt-4o-mini",
    temperature: float = 1.0,
    litellm_params: dict = {},  # newly added
) -> list[str]:
    search_prompt = ""
    if isinstance(template, str) and all(
        x in template for x in ("{count}", "{question}", "{date}")
    ):
        # partial formatting
        search_prompt = template.replace("{date}", get_year())
    elif isinstance(template, str):
        logger.warning(
            "Template does not contain {count}, {question} and {date} variables."
            " Ignoring template and using default search prompt."
        )
    if not search_prompt:
        search_prompt = (
            "We want to answer the following question: {question}\nProvide"
            " {count} unique keyword searches (one search per line) and year ranges"
            " that will find papers to help answer the question. Do not use boolean"
            " operators. Make sure not to repeat searches without changing the"
            " keywords or year ranges. Make some searches broad and some narrow. Use"
            " this format: [keyword search], [start year]-[end year]. where end year"
            f" is optional. The current year is {get_year()}."
        )

    model = LiteLLMModel(name=llm)
    model.config["model_list"][0]["litellm_params"].update({"temperature": temperature})
    model.config['model_list'][0]['litellm_params'].update(litellm_params)  # newly added
    result = await model.run_prompt(
        prompt=search_prompt,
        data={"question": question, "count": count},
        skip_system=True,
    )
    ...
```, 
and change the parent function by adding new parameter, my problem was solved.

fiyen avatar Sep 27 '24 03:09 fiyen

Yeah @fiyen we have a lot of work to do here. We pushed hard to open source a huge amount in a very small window of time, and are going back and improving everything over time

jamesbraza avatar Sep 27 '24 05:09 jamesbraza