frob comments

Results 812 comments of


                                            frob

ollama miscalulates CUDA memory to allocate (deepseek-r1:671b)

That's passing strange. ollama calculates 2*36GiB (`memory.required.allocations="[36.6 GiB 36.1 GiB]"`) for 13 layers with a split of 6,7, and the runner tries to allocate 45.9GiB. Probably coincidence that the difference...

ollama miscalulates CUDA memory to allocate (deepseek-r1:671b)

Possibly related: https://github.com/ollama/ollama/pull/9243

When using open-webui and others ollama seems to struggle stopping/unloading/switching models

[Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md) will aid in debugging.

When using open-webui and others ollama seems to struggle stopping/unloading/switching models

From the log, you loaded llama3.3:70b-instruct-q4_K_M, had chats using both the ollama API (`/api/chat`) and the openAI API (`/v1/chat/completions`). It looks like your last successful chat was at 08:51:51 using...

When using open-webui and others ollama seems to struggle stopping/unloading/switching models

> It is also pretty easy to replicate from my experience with it, If you can reproduce it reliably, it would be interesting to see if setting `num_predict` has any...

When using open-webui and others ollama seems to struggle stopping/unloading/switching models

> logs_freeze.txt There's a lot of truncation happening. Multiple conversation turnarounds are filling up the context buffer and ollama is discarding earlier parts of the conversation. This may lead to...

When using open-webui and others ollama seems to struggle stopping/unloading/switching models

> I will need to look on how i can incorparate num_predict to librechat since i would have to send it on every request is that correct? It's probably easier...

When using open-webui and others ollama seems to struggle stopping/unloading/switching models

Increasing `num_ctx` is to allow longer conversations in the context buffer, `num_predict` is to protect against the model losing coherence, you don't need to do both. Long conversations aren't normally...

When using open-webui and others ollama seems to struggle stopping/unloading/switching models

Marking this as a dupe of #7606 for the stopping issue.

The issue regarding concurrent processing with multiple GPU cards

When you get an OOM error, is all VRAM allocated? [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) and the output of `nvidia-smi` will aid in debugging.