frob

Results 840 comments of frob

There's currently no environment variable that allows configuring that in the service file. A hacky way to achieve this is to prevent ollama from loading the CUDA library: ``` sudo...

Set `OLLAMA_LOAD_TIMEOUT=30m` in the server environment.

Cloud server? https://github.com/ollama/ollama/issues/9292#issuecomment-2676702560

ollama [0.5.2-rc3](https://github.com/ollama/ollama/releases/tag/v0.5.2-rc3) bumps to a new version of llama.cpp, does that use the M40s?

``` time=2024-12-12T15:43:20.589Z level=INFO source=runner.go:946 msg=system info="CUDA : ARCHS = 600,610,620,700,720,750,800,860,870,890,900 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX...

Does that mean setting `OLLAMA_LLM_LIBRARY=cuda_v11` would use all devices?

The IPC mechanism between the CPU and GPU is a busy wait, which is why the CPU is at 100%. This has been [discussed](https://github.com/ggml-org/llama.cpp/issues/8684) by people who understand what the...

Server logs would help diagnose the issue. Since larger models require more connections and more data transfer, they're more susceptible to interruptions.

If you are happy to treat it as a transient event, close the ticket. If it happens again, re-open it and add some server logs.

Vision support was merged recently (https://github.com/ollama/ollama/pull/6963), 0.3.14 doesn't include it.