gpt4all
gpt4all copied to clipboard
Wait, why is everyone running gpt4all on CPU?
I want to get some clarification on these terminologies:
llama-cpp https://github.com/ggerganov/llama.cpp: is a cpp implmenetaion for running llama models on cpu for inferencing ggml https://github.com/ggerganov/ggml : is a format that llama models are converted to be able to run on llama-cpp those two things particulary optimized for AVX and NEO instructions for CPU
gpt4allGPU : allows original (aka non-ggml) format models to be able to run on GPU.
And generally speaking, running LLM on gpu models are way faster than these even optmized CPU conterparts, or is it not?
It's just to show that it is possible to run them on CPU's. Soon AI will not need GPU's-power, that's where it's headed now. It'll get compressed to the lowest size and still run perfectly well.
@Preshy I doubt it. Because AI modesl today are basically matrix multiplication operations that exscaled by GPU. Whereas CPUs are not designed to do arichimic operation (aka. throughput) but logic operations fast (aka. latency) unless you have accacelarated chips encasuplated into CPU like M1/M2. But that's just like glue a GPU next to CPU.
And with Intel goes into Graphics GPU market, I am not sure if Intel will be motivated to release AI accerated CPU because CPU with AI acceration generally grow larger in chip size which invalidate current gen socket design for PC motherboard.
It's just a simpler, cheaper and more portable way to run them. It is also show the progress made into just optimize the code instead just pushing more brute force attempts with better hardware, higher memory and such.
Even so, the SOTA will still get run on GPUs because it's just faster and better. CPUs are for playing around or running edge applications where better resources are not available. For commercial and research applications, CPU is not a viable option.