ggml SpQR compression method

SpQR compression method

Open JianbangZ opened this issue 2 years ago • 2 comments

How feasible to implement spQR into ggml? SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Jun 09 '23 12:06 JianbangZ

The paper: https://arxiv.org/pdf/2306.03078.pdf

The code: https://github.com/Vahe1994/SpQR

Jun 12 '23 23:06 gardner

Given this comment: https://github.com/ggerganov/llama.cpp/issues/1602#issuecomment-1597142154, it seems unlikely SpQR is going to be implemented any time soon:

The main idea of the SpQR paper is to separate "outliers". This has been tried as part of k-quants development and has been shown to be less effective, see for instance https://github.com/ggerganov/llama.cpp/discussions/1595#discussioncomment-6018205 in https://github.com/ggerganov/llama.cpp/discussions/1595).

If we read the SpQR paper more carefully, we find that what they mean by "nearly lossless compression" is to arrive at a quantized perplexity within 1% of the full model. The Q4_K_M variant of k-quants does that for ggml, see for instance PR https://github.com/ggerganov/llama.cpp/pull/1684

We can probably close this issue.

Dec 31 '23 12:12 PoignardAzur

ggml ggml copied to clipboard

SpQR compression method

ggml
ggml copied to clipboard