Can I get a little clarification over my understanding for the terminologies and the GGUF models?

Open AayushSameerShah opened this issue 2 years ago • 0 comments

Hello, community! Recently I have witnessed the rise of Llama.cpp & Ctransformers and how it has managed to let anyone use LLM on their personal computer. I am having some basic gaps in knowledge to wrap my mind around this surge.

1️⃣ Running on CPU

I am willing to use the llama-2-chat model and in quantized format. And only thing is my CPU. It is true that it can run on CPU, but is the speed drastically slowed down?

And specially is there anything to accelerate the speed while running the 4bit quantized model on CPU?

2️⃣ What is the use of BLAS and all other jargons?

I mean, when I went through installing llama.cpp, it had many, I mean many steps to go by. BLAS looked like it could give acceleration in the inference. What is mpirun? I mean there are a lot of things...

The question is: Will BLAS still work whole using CPU? Is it required?

3️⃣ How is llama.cpp different than Ctransformers?

Okay, there is llama.cpp and there are other implementations in python: llama-cpp-python, java: java-llama.cpp... but what is this CTransformers? How is it different than the llama.cpp?

4️⃣ Can I get a filtered, step-by-step guide for installation on Windows?

The README of llama.cpp is pretty clear, but it has all ways scattered for all OS in the single page and it becomes hard to navigate for your purpose.

My purpose:

Run LLama-2-chat model (https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML)
On CPU only
On Windows
With Python

I know, this isn't the place to ask for how to install "llama.cpp" but this repository seems related that's why asking it here too. Now, what things should I install to get the maximum inference speed? Will you please guide me through that?

I know I am asking a lot, but If you can provide me a simple and straight guide to get the maximum speed for my requirements, it will be amazing!

Apologies for the noobie questions, Thanks! 🙏🏻

Sep 18 '23 08:09 AayushSameerShah