Response in Chat isn't displayed during generation until it's finished
What happened?
When using this plugin, model's response in the Chat tab displays empty response until AI completes the entire message, unlike any other chat clients where each new incoming token is immediately displayed.
With slower local models (Ollama), in practice this makes plugin's chat unusable.
Relevant log output or stack trace
No response
Steps to reproduce
No response
CodeGPT version
2.9.0-241.1
Operating System
Windows
How are you connecting to Ollama? It sounds like the stream request parameter is set to false.
I don't see any settings
Hmm, if you're connecting via Ollama provider, then this parameter isn't configurable. I'm unable to reproduce this issue, I've tried multiple models, including the Llama 3.1 8b with the most recent version of Ollama.
How did you ensure that ChatGPT waits for the entire response before rendering it on the screen?
It has an interesting behavior, I tried a prompt write numbers from 1 to 1000 and it got stuck as usual, but after a couple of minutes around number 500 it suddenly wrote everything and started slowly continuing token by token (before getting stuck around 650). Though I'm unable to reproduce it - right now at the very moment when CodeGPT starts showing any output, Ollama gets unlocked, meaning my other client starts generating its new response after being blocked, so it's definitely not streaming. Are there any logs for the requests?
I am getting the same issue here: Just install the plugin from the marketplace, and set backend to Ollama, chat and you see the bug.
I am getting the same issue here: Just install the plugin from the marketplace, and set backend to Ollama, chat and you see the bug.
And if you set Provider to "Custom OpenAI" and configure ollama URL & model manually, it will be fine, so it seems that this is an issue only for Provider "Ollama"
oke, I think I found the issue. Their POST /api/chat API streaming seems to be broken. I will change the underlying API to use the /v1/chat/completions endpoint instead.
Hi @carlrobertoh ... This is not resolved for me, using version 2.11.7-241.1 in PyCharm 2024.2.1 (Professional Edition).
ollama version is 0.3.13
It is still using /api/chat
Adding more to this, /api/chat streams just fine by default, tested with:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'