ChatRWKV icon indicating copy to clipboard operation
ChatRWKV copied to clipboard

GPTQ for RWKV

Open 3outeille opened this issue 2 years ago • 10 comments
trafficstars

This is work in progress and serve as main thread for any questions related to this topic

3outeille avatar Apr 19 '23 09:04 3outeille

@BlinkDL Do I have to quantize blocks.1.att.* as well ? (I am thinking of key, value, receptance weight)

3outeille avatar Apr 19 '23 14:04 3outeille

@3outeille yes do it for all matrices weights (ignore time_xxx)

BlinkDL avatar Apr 20 '23 07:04 BlinkDL

@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?

3outeille avatar Apr 25 '23 10:04 3outeille

@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?

https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark.py use the LAMBADA ppl here

BlinkDL avatar Apr 25 '23 14:04 BlinkDL

Question: would we expect a huge improvement wrt perplexity if we did quantization-aware training?

meditans avatar Apr 26 '23 23:04 meditans

@meditans QAT will probably yield huge improvement but this imply re-training your model whereas GPTQ uses a post-training quantization strategy (no re-training involved)

3outeille avatar Apr 27 '23 08:04 3outeille

How's it going :) are you in Discord

BlinkDL avatar May 08 '23 04:05 BlinkDL

Yep, I sent a message on discord in quantization channel

3outeille avatar May 09 '23 08:05 3outeille

Hi. Is it available now?

Evilran avatar May 19 '23 06:05 Evilran

@Evilran Hi, making it work with chatRWKV is too much of a hassle because it requires to change the RWKV class too much, thus the PR will not be accepted. However, I made it work with HuggingFace version of RWKV if you want: https://github.com/3outeille/GPTQ-for-RWKV

3outeille avatar Jun 03 '23 08:06 3outeille