Do you need to file an issue?

[x] I have searched the existing issues and this bug is not already filed.
[x] I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

I have used Ollama with gpt-oss:20b for a while but have to change Ollama to llama.cpp server.

When using Ollama the tool calls obviously work - I got the needed response in LightRAG chat. When I use the llama-server - it seems the tool calls are not working.

Now, I am in doubt where the issues is since in llama.cpp there is a guide https://github.com/ggml-org/llama.cpp/discussions/15396 how to propoerly run gpt-oss models in llama.cpp for the tools to work. Some say - partially the issue is in the jinja/chat templates format - so this is not in LightRAG scope. But some say the clients that invoke commands to LLM also should follow some specific formats. https://github.com/ggml-org/llama.cpp/discussions/15341

In the end, I have to use LightRAG with this model AND llama.cpp. Ollama is not an option anymore. Please help, what else can I try?

Also my bug in llama.cpp: https://github.com/ggml-org/llama.cpp/issues/17410

Steps to reproduce

run llama-server with gpt-oss:20b model
configure LightRAG .env file to use llama-server: LLM type = openai baseurl: IP:port/v1 to the llama-server

Expected Behavior

Chat response with data from the RAG

LightRAG Config Used

Paste your config here

Logs and screenshots

No response

Additional Information

LightRAG Version: v1.4.9.8/0251
Operating System: Ubuntu 24.04.3
Python Version: 3.21
Related Issues:

Nov 20 '25 15:11 ndrewpj

What specific output does LightRAG return in response to the query?
Please provide the LightRAG server’s log output for further investigation.

If LightRAG fails to generate keywords, it is likely due to the LLM not returning the required JSON format.

Nov 21 '25 03:11 danielaskdd

PR #2401 improved the keyword extraction logic by using the response_format parameter to explicitly instruct the LLM to return the required JSON format. This enhancement improves compatibility and may resolve this issue. You call pull the latest code from main branch an check if it works as expected.

Nov 21 '25 05:11 danielaskdd

PR #2401 improved the keyword extraction logic by using the response_format parameter to explicitly instruct the LLM to return the required JSON format. This enhancement improves compatibility and may resolve this issue. You call pull the latest code from main branch an check if it works as expected.

I've built new docker image after pulling on 25.11.2025, I got some warnings:

Models used in ollama: qwen3-embedding:4b , gpt-oss:20b

Nov 25 '25 07:11 ndrewpj

The GPT-OSS:20B model may not be sufficiently powerful for knowledge graph extraction tasks. We recommend upgrading to a more capable LLM, such as Qwen3-30B-A3B-Instruct.

Nov 25 '25 09:11 danielaskdd

[Bug]: Tools for quering vector DB not called when running gpt-oss:20b on llama.cpp server

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

LightRAG Config Used

Paste your config here

Logs and screenshots

Additional Information