[Bug]: Tools for quering vector DB not called when running gpt-oss:20b on llama.cpp server
Do you need to file an issue?
- [x] I have searched the existing issues and this bug is not already filed.
- [x] I believe this is a legitimate bug, not just a question or feature request.
Describe the bug
I have used Ollama with gpt-oss:20b for a while but have to change Ollama to llama.cpp server.
When using Ollama the tool calls obviously work - I got the needed response in LightRAG chat. When I use the llama-server - it seems the tool calls are not working.
Now, I am in doubt where the issues is since in llama.cpp there is a guide https://github.com/ggml-org/llama.cpp/discussions/15396 how to propoerly run gpt-oss models in llama.cpp for the tools to work. Some say - partially the issue is in the jinja/chat templates format - so this is not in LightRAG scope. But some say the clients that invoke commands to LLM also should follow some specific formats. https://github.com/ggml-org/llama.cpp/discussions/15341
In the end, I have to use LightRAG with this model AND llama.cpp. Ollama is not an option anymore. Please help, what else can I try?
Also my bug in llama.cpp: https://github.com/ggml-org/llama.cpp/issues/17410
Steps to reproduce
- run llama-server with gpt-oss:20b model
- configure LightRAG .env file to use llama-server: LLM type = openai baseurl: IP:port/v1 to the llama-server
Expected Behavior
Chat response with data from the RAG
LightRAG Config Used
Paste your config here
Logs and screenshots
No response
Additional Information
- LightRAG Version: v1.4.9.8/0251
- Operating System: Ubuntu 24.04.3
- Python Version: 3.21
- Related Issues:
- What specific output does LightRAG return in response to the query?
- Please provide the LightRAG server’s log output for further investigation.
If LightRAG fails to generate keywords, it is likely due to the LLM not returning the required JSON format.
PR #2401 improved the keyword extraction logic by using the response_format parameter to explicitly instruct the LLM to return the required JSON format. This enhancement improves compatibility and may resolve this issue. You call pull the latest code from main branch an check if it works as expected.
PR #2401 improved the keyword extraction logic by using the
response_formatparameter to explicitly instruct the LLM to return the required JSON format. This enhancement improves compatibility and may resolve this issue. You call pull the latest code from main branch an check if it works as expected.
I've built new docker image after pulling on 25.11.2025, I got some warnings:
Models used in ollama: qwen3-embedding:4b , gpt-oss:20b
The GPT-OSS:20B model may not be sufficiently powerful for knowledge graph extraction tasks. We recommend upgrading to a more capable LLM, such as Qwen3-30B-A3B-Instruct.