ChatRWKV
ChatRWKV copied to clipboard
GPTQ for RWKV
This is work in progress and serve as main thread for any questions related to this topic
@BlinkDL Do I have to quantize blocks.1.att.* as well ? (I am thinking of key, value, receptance weight)
@3outeille yes do it for all matrices weights (ignore time_xxx)
@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?
@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?
https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark.py use the LAMBADA ppl here
Question: would we expect a huge improvement wrt perplexity if we did quantization-aware training?
@meditans QAT will probably yield huge improvement but this imply re-training your model whereas GPTQ uses a post-training quantization strategy (no re-training involved)
How's it going :) are you in Discord
Yep, I sent a message on discord in quantization channel
Hi. Is it available now?
@Evilran Hi, making it work with chatRWKV is too much of a hassle because it requires to change the RWKV class too much, thus the PR will not be accepted. However, I made it work with HuggingFace version of RWKV if you want: https://github.com/3outeille/GPTQ-for-RWKV