Kawrakow issues

Results 18 issues of


                                            Kawrakow

More accurate Q4_0 and Q4_1 quantizations

### Update After seeing PR #835, I pushed some more changes that only affect the `Q4_0` results. I now get ``` rmse = 0.00185228 ``` for the 7B model. Perplexity...

research 🔬

Adding a simple program to measure speed of dot products

I was surprised by the belief that the dot product `x * y`, where `x` holds quantized model weights and `y` contains floating point values, it is faster to quantize...

Q4_2 quantization with rmse-optimized scale and quants

For quantize-stats we get q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct

A faster version for Q4_1 x Q8_0 dot products

The idea behind being that `Q8_0` quantized values get used many times in the matrix multiplications where they are involved. In the current implementations, when we are evaluating the dot...

performance

RMSE-optimized quants for all quantization types

The PR adds a new build option (`LLAMA_NO_RMSE`), which is off by default. When off, all current quantization types (`Q4_0, Q4_1, Q4_2, Q4_3`) are performed with RMSE minimization (on master...

high priority

generation quality

Variable bit rate quantization

Variable bit rate is commonly used in audio and video compression, so why not try on LLMs? My guess is that a locally adaptive variable bit rate would require a...

enhancement

generation quality

Less than 4 bits

Kawrakow

More accurate Q4_0 and Q4_1 quantizations

Adding a simple program to measure speed of dot products

Q4_2 quantization with rmse-optimized scale and quants

A faster version for Q4_1 x Q8_0 dot products

RMSE-optimized quants for all quantization types

Variable bit rate quantization

Q4_K implementation for Metal

metal : add Q2_K implementation

Faster AVX2 prompt processing for k-quants and IQ4_XS

Faster AVX2 matrix multiplications for lgacy quants