Models loading on CPU instead of GPU after updating version
I updated my version because some DeepSeek models were not working loading; after updating, they started loading, but only on CPU. I tried with other older models on my system that used to load on GPU, and they started only loading onto CPU as well.
I noticed this line in particular that others have mentioned for the same issue:
tensor 'token_embd.weight' (q4_K) (and 322 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
I downgraded the version to 0.3.6 and it loads onto my GPU now.
I can just use the older version but it would be nice if this gets fixed so that those of us with this issue aren't locked out of newer versions.
Having exact same issue but if I go above 0.3.4
What CUDA are you using and are you using pre-made wheel using https://abetlen.github.io/llama-cpp-python/whl/
Same issue in Docker version... womp.
I have a similar issue, and the model I use is not supported in v0.3.6.
I meet the same question