frob comments

Results 819 comments of


                                            frob

Ollama will not build against CUDA later than 12.6

> built Ollama with CUDA support on a SuSE system with CUDA 13.0. 12.8 > I don't think I can use 13.0 because it doesn't support my GPU. 13.0 supports...

Ollama 0.12.10 Windows: 500 Internal Server Error With `Qwen3-VL` Models Set to 256k Context Size in Ollama App

Post the full [server log]( https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md).

Ollama 0.12.10 Windows: 500 Internal Server Error With `Qwen3-VL` Models Set to 256k Context Size in Ollama App

Perhaps https://github.com/ggml-org/llama.cpp/issues/15049

Unable to recognize the long text content.

Can you upload the PDF and give a pointer to the client that you are using?

Unable to recognize the long text content.

I asked about the client so that I could replicate the problem as closely as possible, but it looks too time-consuming so I instead just tested ollama. Baseline: test with...

Unable to recognize the long text content.

Your logs don't contain enough information, you removed the actual content that FastGPT is sending to ollama. If you add OLLAMA_DEBUG=1 to your server environment and try again, the resulting...

Unable to recognize the long text content.

You need to set `OLLAMA_DEBUG=1` in the server environment. Run this command: ``` sudo systemctl edit ollama ``` Add these lines: ``` [Service] Environment="OLLAMA_DEBUG=1" ``` Save and exit, then run...

Unable to recognize the long text content.

Your context size is too small for the size of the document you are attempting to summarize. For most of the tests, you used qwen2.5:32b with a context of 30001:...

Unable to recognize the long text content.

You have `OLLAMA_NUM_PARALLEL` unset. In ollama-5.log, you set a context size of 96001. Since you have lots of VRAM, ollama first tries to load the model with `OLLAMA_NUM_PARALLEL=4`. That is,...

Unable to recognize the long text content.

You can set `OLLAMA_SCHED_SPREAD=1` to have ollama divide the model across all cards. Theoretically this will allow each of the GPUs to work on a different completion at the same...