Local-LLM-Comparison-Colab-UI icon indicating copy to clipboard operation
Local-LLM-Comparison-Colab-UI copied to clipboard

Which runs on least powerful hardware...

Open epugh opened this issue 1 year ago • 1 comments

Do you have any data/sorting of which ones used the least amount of resources? Versus which did the best. The usecase I have in mind is in a very constrained server environment, so something that doesn't use much memory or CPU would be a key driver in picking one, versus which does best. I'm thinking in the 2 GB of ram space....

epugh avatar Sep 08 '23 22:09 epugh

First, the model size affects the resource use. 7B uses fewer resource than 13B. Then, the quantization method affects the resource use. 2Bit < 3Bit <4Bit < 6Bit etc. So it's a combination of small model size (7B) and a low quantization method (e.g. 2 or 3Bit) that gives low resource use, but has high quality loss.

But 2GB is really too small...

Here is some more info on quantization methods: 2 or Q4_0 : 3.50G, +0.2499 ppl @ 7B - small, very high quality loss - legacy, prefer using Q3_K_M 3 or Q4_1 : 3.90G, +0.1846 ppl @ 7B - small, substantial quality loss - legacy, prefer using Q3_K_L 8 or Q5_0 : 4.30G, +0.0796 ppl @ 7B - medium, balanced quality - legacy, prefer using Q4_K_M 9 or Q5_1 : 4.70G, +0.0415 ppl @ 7B - medium, low quality loss - legacy, prefer using Q5_K_M 10 or Q2_K : 2.67G, +0.8698 ppl @ 7B - smallest, extreme quality loss - not recommended 12 or Q3_K : alias for Q3_K_M 11 or Q3_K_S : 2.75G, +0.5505 ppl @ 7B - very small, very high quality loss 12 or Q3_K_M : 3.06G, +0.2437 ppl @ 7B - very small, very high quality loss 13 or Q3_K_L : 3.35G, +0.1803 ppl @ 7B - small, substantial quality loss 15 or Q4_K : alias for Q4_K_M 14 or Q4_K_S : 3.56G, +0.1149 ppl @ 7B - small, significant quality loss 15 or Q4_K_M : 3.80G, +0.0535 ppl @ 7B - medium, balanced quality - recommended 17 or Q5_K : alias for Q5_K_M 16 or Q5_K_S : 4.33G, +0.0353 ppl @ 7B - large, low quality loss - recommended 17 or Q5_K_M : 4.45G, +0.0142 ppl @ 7B - large, very low quality loss - recommended 18 or Q6_K : 5.15G, +0.0044 ppl @ 7B - very large, extremely low quality loss 7 or Q8_0 : 6.70G, +0.0004 ppl @ 7B - very large, extremely low quality loss - not recommended 1 or F16 : 13.00G @ 7B - extremely large, virtually no quality loss - not recommended 0 or F32 : 26.00G @ 7B - absolutely huge, lossless - not recommended

Troyanovsky avatar Sep 11 '23 02:09 Troyanovsky