LightCompress
LightCompress copied to clipboard
Why quarot algo R2 rotate needed online_rotate?
Based on the original quarot method, the R2 rotate can be observed by weight, no need for online rotate. https://github.com/ModelTC/llmc/blob/867fb4f536073a2898048c39aa098979521a45a6/llmc/compression/quantization/quarot.py#L139
I implement the reshape R2 transform from spinquant code base, this issuse fixed
btw, there are missing qk online rotate which is R3