rwkv.cpp icon indicating copy to clipboard operation
rwkv.cpp copied to clipboard

Consider uploading some quantized checkpoints to hugginface

Open Calandiel opened this issue 2 years ago • 2 comments

Correct me if I'm wrong but quantizing would require loading the models in their unquantized form (as per torch.load in https://github.com/saharNooby/rwkv.cpp/blob/master/rwkv/convert_pytorch_to_ggml.py, line 126). Not to mention how much heavier the unquantized models are on bandwidths.

Calandiel avatar Apr 21 '23 14:04 Calandiel

Only PyTorch -> rwkv.cpp conversion would require to load the whole model in the RAM; quantization is done tensor-by-tensor. You are right about the bandwidth tho.

I'll consider it, thanks for the suggestion!

saharNooby avatar Apr 21 '23 15:04 saharNooby

I have uploaded some quantized RWKV-4-Raven models to HuggingFace at LoganDark/rwkv-4-raven-ggml. Conversion took about 2 hours, and upload took about 24 hours and 500GB of disk space.

At the time of writing, the available models are:

Name f32 f16 Q4_0 Q4_1 Q4_2 Q5_1 Q8_0
RWKV-4-Raven-1B5-v11-Eng99-20230425-ctx4096 Yes Yes Yes No Yes Yes Yes
RWKV-4-Raven-3B-v11-Eng99-20230425-ctx4096 Yes Yes Yes No Yes Yes Yes
RWKV-4-Raven-7B-v11x-Eng99-20230429-ctx8192 Yes Yes Yes No Yes Yes Yes
RWKV-4-Raven-14B-v11x-Eng99-20230501-ctx8192 Split Yes Yes No Yes Yes Yes

Feel free to create a discussion if you have a request.

LoganDark avatar May 19 '23 23:05 LoganDark