gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

Wait, why is everyone running gpt4all on CPU?

Open yhyu13 opened this issue 1 year ago • 1 comments

I want to get some clarification on these terminologies:

llama-cpp https://github.com/ggerganov/llama.cpp: is a cpp implmenetaion for running llama models on cpu for inferencing ggml https://github.com/ggerganov/ggml : is a format that llama models are converted to be able to run on llama-cpp those two things particulary optimized for AVX and NEO instructions for CPU

gpt4allGPU : allows original (aka non-ggml) format models to be able to run on GPU.

And generally speaking, running LLM on gpu models are way faster than these even optmized CPU conterparts, or is it not?

yhyu13 avatar Apr 15 '23 10:04 yhyu13

It's just to show that it is possible to run them on CPU's. Soon AI will not need GPU's-power, that's where it's headed now. It'll get compressed to the lowest size and still run perfectly well.

Preshy avatar Apr 15 '23 13:04 Preshy

@Preshy I doubt it. Because AI modesl today are basically matrix multiplication operations that exscaled by GPU. Whereas CPUs are not designed to do arichimic operation (aka. throughput) but logic operations fast (aka. latency) unless you have accacelarated chips encasuplated into CPU like M1/M2. But that's just like glue a GPU next to CPU.

And with Intel goes into Graphics GPU market, I am not sure if Intel will be motivated to release AI accerated CPU because CPU with AI acceration generally grow larger in chip size which invalidate current gen socket design for PC motherboard.

yhyu13 avatar Apr 16 '23 08:04 yhyu13

It's just a simpler, cheaper and more portable way to run them. It is also show the progress made into just optimize the code instead just pushing more brute force attempts with better hardware, higher memory and such.

danielkariv avatar Apr 16 '23 15:04 danielkariv

Even so, the SOTA will still get run on GPUs because it's just faster and better. CPUs are for playing around or running edge applications where better resources are not available. For commercial and research applications, CPU is not a viable option.

digisomni avatar Apr 20 '23 08:04 digisomni