phalexo
phalexo
Did you to say 0.1.13 works? It is the v0.1.11 that works for me. I copied gguf folder from v0.1.11 to v0.1.12, recompiled and it made v0.1.12 work. Between 11...
@madsamjp, tried it unsuccessfully with the next version up, v0.1.16, v0.1.15 cannot possibly work. On Fri, Dec 15, 2023 at 4:17 PM Igor Schlumberger ***@***.***> wrote: > @phalexo Ollama 0.1.15...
```bash git clone --recursive https://github.com/jmorganca/ollama.git cd ollama/llm/llama.cpp vi generate_linux.go ``` ```go //go:generate cmake -S ggml -B ggml/build/cuda -DLLAMA_CUBLAS=on -DLLAMA_ACCELERATE=on -DLLAMA_K_QUANTS=on -DLLAMA_CUDA_FORCE_MMQ=on //go:generate cmake --build ggml/build/cuda --target server --config Release //go:generate...
How is the performance though? Is it impacted by the change? On Sat, Dec 16, 2023, 6:11 AM madsamjp ***@***.***> wrote: > @phalexo this works! It seems that adding >...
Did you ever test for performance with MMQ flag versus the 0.1.11? On Wed, Jan 3, 2024, 1:11 PM madsamjp ***@***.***> wrote: > @technovangelist I've updated to the > latest...
All my testing is ad hoc, difficult to assess. I thought you run a largish system so it may be noticeable. I have a suspicion that there may be a...
I see a very similar behavior running on GPUs, VRAM is less than 50%. Besides #### I see page scrolling pretty fast. This is the response to the first query....
Check the prompt format for this model. I think I've seen this when I failed to use the correct prompt format. On Mon, Jan 8, 2024, 9:01 AM simplesisu ***@***.***>...
> Lot of user don't understand they are facing a memory error. It could be nice to explain in the error message that it is a memory error. > >...
The most likely bug is that one of the specialized matrix/matrix multiply kernels is leaking memory. On Fri, Jan 5, 2024, 1:01 PM Igor Schlumberger ***@***.***> wrote: > @jukofyork I'm...