Richard Ginsberg comments

Repositories
Issues
Comments

Results 3 comments of


                                            Richard Ginsberg

Ollama v0.1.18+ does not fully unload from GPU when idle

> This should be resolved by #3218 Just tested v0.1.30, the issue is still present. ![ollama-p40-issues-2](https://github.com/ollama/ollama/assets/819865/22751e8b-5af7-445a-bb5f-a7e6291a607f)

Ollama v0.1.18+ does not fully unload from GPU when idle

Above I confirmed the issue persists in v0.1.30. To confirm is wasn't new from v0.1.30, I tried in v0.1.29. Same issue. `docker run -d --gpus=all -v /home/username/ollama:/root/.ollama -p 11434:11434 --name...

fastchat.serve.openai_api_server doesn't work with `stream=true` parameter

fastchat streams output tokens on another endpoint/module. Hoping it was in roadmap to port to fastchat.serve.openai_api_server