Chris
Chris
Is this expected behaviour and if so, how to disable it?
The requirements.txt use a version of bitsandbytes that is not compiled for GPU support, so this fails to work on linux. I have installed version 0.37 of b&b from pip...
RuntimeError: The expanded size of the tensor (1024) must match the existing size (768) at non-singleton dimension 0. Target sizes: [1024]. Tensor sizes: [768]
When I use the ollama API the first response works fine, then without changing anything, subsequent requests give a response as if its ignoring the system prompt and spits out...