RWKV-LM issues

2

Is it possible use bpe tokenizer instead rwkv_vocab_v20230424 in the next model? I tried rwkv model in Thai language. It look good but it is very slow because Thai is...

wannaphong

RWKV-6 and newer

1

To bring more awareness and adoption of RWKV, would it be possible to get benchmark scores on the Huggingface LLM leaderboard or on the model cards itself (For RWKV-6 and...

DIGist

rwkv-4 RWKV_TimeMix如何分割序列，没有看到cuda_run接受任何sequence切分的标志？

1

RWKV_TimeMix中在序列维度上进行操作，在进行训练时训练数据常常是首尾相接的，序列之间需要隔断分开进行处理，例如flashattention会接收一个序列开始位置的输入，RUN_CUDA似乎没有，是如何实现的

Bsdnbo

Fix broken accumulate_grad_batches behavior

# Fix broken `accumulate_grad_batches` argument in v5 trainer While trying to finetune some of the RWKV-7-Pile models, I found that the `accumulate_grad_batches` argument sent to the main trainer file had...

Trickshotblaster

rwkv-7 代码和模型不一致

1

rwkv_v7_demo.py : args.vocab_size = 50304 01.b 实际：65536 ` raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for RWKV: Missing key(s) in state_dict: "blocks.0.att.v0", "blocks.0.att.v1", "blocks.0.att.v2". size...

qxde01

Question for WKV backward() part

![Image](https://github.com/user-attachments/assets/f8193399-3cf3-4d0f-8d7f-c48a43a881bd) ![Image](https://github.com/user-attachments/assets/e5f6984f-38ae-412f-9d4f-fe23df0f45d6) 想问一下这个报错应该是哪里的问题

MaoFuyou

rwkv7 groupnorm eps

1

你好，想请教一下head_size_divisor的含义是什么，和head_size的关系是什么？以及还想请教下self.ln_x = nn.GroupNorm(H, C, eps=(1e-5) * (self.head_size_divisor ** 2)) # !!! notice eps value !!! 这里为什么要这样定义eps, 谢谢

jihaoh98

RWKV7 结构图输出那里应该少了个转置

最后的输出的一部分应该是o_t = r_t @ S_t^T，看起来图中计算出来S没有进行转置吧？（代码是正确的） @BlinkDL

jihaoh98

Update README.md

Added context about data.

zitterbewegung

RWKV-LM
RWKV-LM copied to clipboard

Metadata

implement decodng stage in rwkv_v7_numpy.py

BPE Tokenizer

RWKV-6 and newer

rwkv-4 RWKV_TimeMix如何分割序列，没有看到cuda_run接受任何sequence切分的标志？

Fix broken accumulate_grad_batches behavior

rwkv-7 代码和模型不一致

Question for WKV backward() part

rwkv7 groupnorm eps

RWKV7 结构图输出那里应该少了个转置

Update README.md

← Metadata

Owner

Metadata

RWKV-LM RWKV-LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

RWKV-LM
RWKV-LM copied to clipboard