R2R
R2R copied to clipboard
Support setting context window size for Ollama models (num_ctx)
Is your feature request related to a problem? Please describe. When uploading a document I am seeing below warning in Ollama logs:
[GIN] 2025/02/16 - 16:26:54 | 200 | 10.7681ms | 10.20.1.73 | POST "/api/show"
time=2025-02-16T16:27:30.444-05:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=2363 keep=5 new=2048
[GIN] 2025/02/16 - 16:27:34 | 200 | 4.2324405s | 10.20.1.73 | POST "/api/generate"
[GIN] 2025/02/16 - 16:27:34 | 200 | 10.9234ms | 10.20.1.73 | POST "/api/show"
[GIN] 2025/02/16 - 16:27:34 | 200 | 11.0538ms | 10.20.1.73 | POST "/api/show"
The prompt size exceeds the default 2048 so the input is truncated. Would like the ability to set the size of the context window to prevent truncation.
I was not able to find an existing config that controls this setting. Please let me know if I missed something in the docs.
While searching through issues in this repo I came across this PR: https://github.com/SciPhi-AI/R2R/pull/1033 which mentions the feature I am after, however it looks like it was closed.
Describe the solution you'd like
Would nice to have additional config key like num_ctx (same as what ollama expects when you create a client) in the config. That way users have the ability to make adjustments to the config as needed.
When I create a client for ollama I use below example snippet:
llm = ChatOllama(
model=ollama_model,
temperature=ollama_temp,
base_url=ollama_base_url,
num_ctx=ollama_context_size
)
to set the context size via the num_ctx key. This works as expected and I am able to increase the context window beyond the default 2048.
Describe alternatives you've considered N/A
Additional context N/A