llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Q4_2 quantization with rmse-optimized scale and quants

Open ikawrakow opened this issue 2 years ago • 0 comments

For quantize-stats we get q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012

For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks.

Quantization is slow (~90 seconds on my Mac for 7B) as not multi-threaded as in PR #896.

ikawrakow avatar Apr 19 '23 15:04 ikawrakow