Performance of inference with 65B model on high-end CPU?

Open vedantroy opened this issue 2 years ago • 2 comments

How well does this model perform on a CPU? Are there benchmarks for running some of the bigger models (like LLaMA-65B) on a CPU)?

May 04 '23 00:05 vedantroy

I don't have enough RAM to test, but I'd suggest looking at performance numbers for llama.cpp - we should be about on par (barring any improvements that we haven't kept up with)

May 04 '23 17:05 philpax

I don't have hard numbers, but I get somewhere under one token/sec on a 7950x with alpaca-lora-65b-ggml-q4_0. You probably won't find models larger than 30B to be practical.

May 05 '23 07:05 tehmatt