frob
frob
I did some quick tests and tokens per second increased by 14% from AVX to AVX2, so enabling other CPU features for the CUDA build seems like a good idea.
I built a version of the CUDA driver with AVX2 and did a test against stock 0.3.4. Model qwen2:0.5b, prompt "why is the sky blue?", RTX4070. baseline CPU performance in...
On a linux system using docker: ```diff --- a/Dockerfile +++ b/Dockerfile @@ -18,7 +18,7 @@ ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH COPY --from=llm-code / /go/src/github.com/ollama/ollama/ WORKDIR /go/src/github.com/ollama/ollama/llm/generate ARG CGO_CFLAGS -RUN OLLAMA_SKIP_STATIC_GENERATE=1 OLLAMA_SKIP_CPU_GENERATE=1 sh...
`mixtral:8x22b-text-v0.1-q2_K` is pre-trained (that's what `text` means) and does not support tools. Try [mixtral:8x22b-instruct-v0.1-q2_K](https://ollama.com/library/mixtral:8x22b-instruct-v0.1-q2_K).
If you set `OLLAMA_DEBUG=1` you will probably see a lot of `shifting` messages when this occurs. I think what's happened is that the model has lost coherence and is "rambling"...
[Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging. It would be helpful to also have a copy of the image that caused the failure. ```console $ ollama run qnguyen3/nanollava "tell me what...
This affects CPU based runners (cpu, cpu_avx, cpu_axv2) from 0.3.14 onwards. Earlier versions work fine, as does CUDA based runners in all versions through to 0.4.0-rc6. ROCm and Metal untested....
It would be helpful if you could provide the server log, the model, and an example of the input you are using.
It would be helpful if you could provide the server log.
Looks like the same issue. You can either rollback to 0.3.13, try a different model, or [get a GPU](https://www.jeffgeerling.com/blog/2024/use-external-gpu-on-raspberry-pi-5-4k-gaming).