PENG Bo
PENG Bo
另外请看 https://github.com/TorchRWKV/rwkv-kit
https://github.com/OpenMOSE/RWKV-LM-RLHF
之前有人提到过,我催一下改
remember to set --strategy deepspeed_stage_2 --grad_cp 1 and use DS_BUCKET_MB=200 0.1B can train on 8G VRAM (single GPU), for example: https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v7/train_temp/demo-training-run.sh
you can pass the "state" of rwkv because it's an RNN
use the final rwkv-xx.pth with largest xx it's highly recommended to train rwkv-6 which will give you rwkv-final.pth
RWKV架构的历史: https://rwkv.cn/RWKV-Architecture
Hi 欢迎大家在技术qq群问 325154699 关于 RWKV 的各种论文见 https://rwkv.com 目前还没有图的应用,不过肯定是可以直接换的,遇到问题欢迎问
@ChangyongYang https://github.com/OpenMOSE/RWKV-Infer
it's ```u @ t, 1 @ t-1, w @ t-2, w^2 @ t-3, ...```