paper-qa
paper-qa copied to clipboard
Impact of 'num_ctx' and 'num_predict' parameters on answer quality.
Hello,
I currently use paperQA with Llama3.1:70b served by Ollama.
With LLM default parameters, answers quality is often poor, especially when I increase answer.evidence_k and answer.answer_max_sources. With default answer settings, answer is just 'correct'. Theorically, increasing these parameters shoulds increase answer relevance instead...
I guess it's a problem with LLM parameters but I'm not sure...
So I wonder what is exact impact of 'num_ctx' and 'num_predict' parameters on answer quality.
local_llm_config = dict(
model_list=[
dict(
model_name=f"ollama/llama3.1:70b",
litellm_params=dict(
model=f"ollama/llama3.1:70b",
ocal_llm_config = dict(
model_list=[
dict(
model_name=f"ollama/llama3.1:70b",
litellm_params=dict(
model=f"ollama/llama3.1:70b",
api_type= "ollama",
api_base="http://192.168.1.15:11434",
num_ctx=2048, #Default 2048
num_predict=128, # Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)
),
),
]
)
Best regards.
I couldn't find specific information about the impact of the 'num_ctx' and 'num_predict' parameters on answer quality in the context of using Llama3.1 with paperQA. However, generally speaking:
-
The
num_ctxparameter typically defines the context window size, which is the number of tokens the model can consider at once. A larger context window allows the model to take more information into account, potentially improving the relevance and coherence of the generated answers. -
The
num_predictparameter controls the maximum number of tokens the model will generate in response. A higher value might lead to more detailed answers, but it could also increase the risk of generating less relevant or more verbose responses.
Adjusting these parameters can help balance between the depth of context considered and the length of the generated text, which might influence the quality of the answers. Experimenting with these settings while monitoring the quality of the output could help you find an optimal configuration for your use case.
To continue talking to Dosu, mention @dosu.