llama-cpp-python
llama-cpp-python copied to clipboard
Models loading on CPU instead of GPU after updating version
I updated my version because some DeepSeek models were not working loading; after updating, they started loading, but only on CPU. I tried with other older models on my system that used to load on GPU, and they started only loading onto CPU as well.
I noticed this line in particular that others have mentioned for the same issue:
tensor 'token_embd.weight' (q4_K) (and 322 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
I downgraded the version to 0.3.6 and it loads onto my GPU now.
I can just use the older version but it would be nice if this gets fixed so that those of us with this issue aren't locked out of newer versions.