csoriano2718 issues

Repositories
Issues
Comments

Results 3 issues of


                                            csoriano2718

Add RAG modes and strengthen strict mode

This PR adds RAG mode control via the `RAG_MODE` environment variable, giving users clear control over how the RAG proxy balances document retrieval with general AI knowledge. ## RAG Modes...

Pass through reasoning_content in RAG proxy streaming

The RAG proxy was not forwarding reasoning_content from the upstream model server to clients, causing reasoning models (deepseek-r1, qwq, etc.) to appear to work but without returning their reasoning process....

RFE: Add --reasoning-budget flag to control thinking in reasoning models

## Summary Request to expose llama.cpp's `--reasoning-budget` flag in `ramalama serve` to properly control reasoning/thinking behavior in models like DeepSeek-R1. ## Background - llama.cpp added the `--reasoning-budget` flag (PR [#13771](https://app.semanticdiff.com/gh/ggml-org/llama.cpp/pull/13771/overview))...