frob

Results 806 comments of frob

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#amd-gpu-discovery:~:text=%22exec%2Dopts%22%3A%20%5B%22native.cgroupdriver%3Dcgroupfs%22%5D

Have you tried setting `native.cgroupdriver` as in the provided link?

```console $ cat /etc/docker/daemon.json { "runtimes": { "nvidia": { "args": [], "path": "nvidia-container-runtime" } }, "exec-opts": ["native.cgroupdriver=cgroupfs"] } ```

Please post the full log, there are earlier log lines about device detection and memory calculations that may be relevant. Also set `OLLAMA_DEBUG=1` in the [server environment](https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-windows), it may give...

Unfortunately it's not clear why the alloc failed, it seems like there should be plenty of VRAM available. The 0xc0000409 (STATUS_STACK_BUFFER_OVERRUN) exit status suggests that recovery from the failed alloc...

Create a new model: ```sh $ ollama show --modelfile llama3.1:70b-instruct-q3_K_L | sed -e 's/^FROM.*/FROM llama3.1:70b-instruct-q3_K_L/' > Modelfile $ echo "PARAMETER num_gpu 75" >> Modelfile $ ollama create llama3.1:70b-instruct-q3_K_L-ng75 $ ollama...

Gemma2 needs ollama 0.1.47 or newer. minstal-nemo needs ollama 0.2.8 or newer.

``` load_tensors: tensor 'token_embd.weight' (q6_K) (and 42 others) cannot be used with preferred buffer type CUDA_Host, using CPU instead ``` This is not blocking, it's just indicating that some tensors...

ollama has calculated that it can only fit 13 layers on the GPUs (`layers.offload=13`) which would take 36G of 39G on each GPU. You have overridden that by setting `num_gpu...

Please provide the full log for unset `num_gpu`.