PENG Bo comments

Results 265 comments of


                                            PENG Bo

关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植

另外请看 https://github.com/TorchRWKV/rwkv-kit

How to apply GPRO

https://github.com/OpenMOSE/RWKV-LM-RLHF

remember to set --strategy deepspeed_stage_2 --grad_cp 1 and use DS_BUCKET_MB=200 0.1B can train on 8G VRAM (single GPU), for example: https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v7/train_temp/demo-training-run.sh

With rwkv-V4, If I wish to make an encoder decoder model for example to be used in translation, what are the hidden states that needs passing between the encoder and the decoder? Can you provide some guideline on this matter or any existing work?

you can pass the "state" of rwkv because it's an RNN

RWKV-v4 training doesn't stop after max_epochs defined

use the final rwkv-xx.pth with largest xx it's highly recommended to train rwkv-6 which will give you rwkv-final.pth

wkv的操作为什么要这么设计呀？

RWKV架构的历史： https://rwkv.cn/RWKV-Architecture

Batch Inference

Hi 欢迎大家在技术qq群问 325154699 关于 RWKV 的各种论文见 https://rwkv.com 目前还没有图的应用，不过肯定是可以直接换的，遇到问题欢迎问

Batch Inference

@ChangyongYang https://github.com/OpenMOSE/RWKV-Infer

Probable mistake in Eq. 19 in the arxiv paper "Eagle and Finch"

it's ```u @ t, 1 @ t-1, w @ t-2, w^2 @ t-3, ...```

PENG Bo

关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植

How to apply GPRO

论文公式写错了

RWKV-v7 training

With rwkv-V4, If I wish to make an encoder decoder model for example to be used in translation, what are the hidden states that needs passing between the encoder and the decoder? Can you provide some guideline on this matter or any existing work?

RWKV-v4 training doesn't stop after max_epochs defined

wkv的操作为什么要这么设计呀？

Batch Inference

Batch Inference

Probable mistake in Eq. 19 in the arxiv paper "Eagle and Finch"