csoriano2718

Results 3 issues of csoriano2718

This PR adds RAG mode control via the `RAG_MODE` environment variable, giving users clear control over how the RAG proxy balances document retrieval with general AI knowledge. ## RAG Modes...

The RAG proxy was not forwarding reasoning_content from the upstream model server to clients, causing reasoning models (deepseek-r1, qwq, etc.) to appear to work but without returning their reasoning process....

## Summary Request to expose llama.cpp's `--reasoning-budget` flag in `ramalama serve` to properly control reasoning/thinking behavior in models like DeepSeek-R1. ## Background - llama.cpp added the `--reasoning-budget` flag (PR [#13771](https://app.semanticdiff.com/gh/ggml-org/llama.cpp/pull/13771/overview))...