ChatRWKV GPTQ for RWKV

trafficstars

This is work in progress and serve as main thread for any questions related to this topic

Apr 19 '23 09:04 3outeille

@BlinkDL Do I have to quantize blocks.1.att.* as well ? (I am thinking of key, value, receptance weight)

Apr 19 '23 14:04 3outeille

@3outeille yes do it for all matrices weights (ignore time_xxx)

Apr 20 '23 07:04 BlinkDL

@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?

Apr 25 '23 10:04 3outeille

@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?

https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark.py use the LAMBADA ppl here

Apr 25 '23 14:04 BlinkDL

Question: would we expect a huge improvement wrt perplexity if we did quantization-aware training?

Apr 26 '23 23:04 meditans

@meditans QAT will probably yield huge improvement but this imply re-training your model whereas GPTQ uses a post-training quantization strategy (no re-training involved)

Apr 27 '23 08:04 3outeille

How's it going :) are you in Discord

May 08 '23 04:05 BlinkDL

Yep, I sent a message on discord in quantization channel

May 09 '23 08:05 3outeille

Hi. Is it available now?

May 19 '23 06:05 Evilran

@Evilran Hi, making it work with chatRWKV is too much of a hassle because it requires to change the RWKV class too much, thus the PR will not be accepted. However, I made it work with HuggingFace version of RWKV if you want: https://github.com/3outeille/GPTQ-for-RWKV

Jun 03 '23 08:06 3outeille

ChatRWKV ChatRWKV copied to clipboard

GPTQ for RWKV

ChatRWKV
ChatRWKV copied to clipboard