csoriano2718
csoriano2718
This PR adds RAG mode control via the `RAG_MODE` environment variable, giving users clear control over how the RAG proxy balances document retrieval with general AI knowledge. ## RAG Modes...
The RAG proxy was not forwarding reasoning_content from the upstream model server to clients, causing reasoning models (deepseek-r1, qwq, etc.) to appear to work but without returning their reasoning process....
## Summary Request to expose llama.cpp's `--reasoning-budget` flag in `ramalama serve` to properly control reasoning/thinking behavior in models like DeepSeek-R1. ## Background - llama.cpp added the `--reasoning-budget` flag (PR [#13771](https://app.semanticdiff.com/gh/ggml-org/llama.cpp/pull/13771/overview))...