frob comments

Results 849 comments of


                                            frob

support for qwen3-embedding and qwen3-reranker models

#11213

Loading time of mistral-small3.1 is too long

[Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.

Loading time of mistral-small3.1 is too long

Memory estimation for gemma3:27b-it-q4_K_M: ``` time=2025-06-16T16:25:06.614+03:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=-1 layers.model=63 layers.offload=63 layers.split="" memory.available="[22.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="20.9 GiB" memory.required.partial="20.9 GiB" memory.required.kv="1.6 GiB" memory.required.allocations="[20.9 GiB]" memory.weights.total="15.4 GiB" memory.weights.repeating="14.3 GiB"...

Loading time of mistral-small3.1 is too long

> What would be the problem? There are layers running in system RAM, where inference is slower. Have you upgraded ollama recently? Downloaded a new version of the model? Upgraded...

Loading time of mistral-small3.1 is too long

> I've upgraded ollama recently yes. Mystery solved. There have been recent changes to the estimation logic to reduce the chance of an OOM. You can force ollama to load...

Loading time of mistral-small3.1 is too long

> And this doesn't seem like an resolution because this guides to load the model on RAM and CPU. It demonstrates how setting `num_gpu` can be used to control the...

Loading time of mistral-small3.1 is too long

[Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will aid in debugging.

Loading time of mistral-small3.1 is too long

Nothing in the log indicates a problem. The CUDA backend was loaded, a portion of the layers were loaded into the GPU, the server returned successful HTTP codes. Try setting...

Loading time of mistral-small3.1 is too long

https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-windows

Loading time of mistral-small3.1 is too long

Your issue is that responses now take longer after upgrading to 0.9.1? As [explained above](https://github.com/ollama/ollama/issues/11087#issuecomment-2978456204), the new version of ollama is more conservative with memory allocations. You can [override](https://github.com/ollama/ollama/issues/11087#issuecomment-2977115053) `num_gpu`...