llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Models loading on CPU instead of GPU after updating version

Open KuraiAI opened this issue 9 months ago • 4 comments

I updated my version because some DeepSeek models were not working loading; after updating, they started loading, but only on CPU. I tried with other older models on my system that used to load on GPU, and they started only loading onto CPU as well. I noticed this line in particular that others have mentioned for the same issue: tensor 'token_embd.weight' (q4_K) (and 322 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead

I downgraded the version to 0.3.6 and it loads onto my GPU now.

I can just use the older version but it would be nice if this gets fixed so that those of us with this issue aren't locked out of newer versions.

KuraiAI avatar Mar 05 '25 20:03 KuraiAI