RWKV-LM icon indicating copy to clipboard operation
RWKV-LM copied to clipboard

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, sa...

Results 161 RWKV-LM issues
Sort by recently updated
recently updated
newest added

I am testing RWKV-v7 0.4B for training but it seems not working like what I expected. How many memory do you use for this model or other 1.5B or 3B...

Can someone put the pictures in a subdirectories, otherwise they look a bit messy

在第一个block会多一个attn("blocks.0.att.v1", "blocks.0.att.v2", "blocks.0.att.v0".),其它层正常 ![Image](https://github.com/user-attachments/assets/1dc2d72e-2027-4a6e-b7a1-c2f98db47753) ![Image](https://github.com/user-attachments/assets/6338cd61-0ce1-4d8c-9d11-55610c73a03b)

Hello, I am trying to use the RWKV4 model to process a sequential pkl dataset. However, when I use the CUDA kernel, I encounter an error'UnicodeDecodeError: 'gbk' codec can't decode...

Hello! I wonder if RWKV7 used the sequence packing strategy during pre-training? If so, do the samples need to be masked from each other?

代码中的state通过cuda文件内生成,请问为什么不需要显示存储state?

Using PyTorch's built-in fused operators, which internally utilize fp32 for forward computation, improves both speed and accuracy.

How to apply GPRO methods to further training of the rwkv model

is there any plans of releasing a reasoner model?

The __getitem__ method did not return any value when args.data_type == "uint16", causing data loaders to receive None. Added an explicit `return x, y` to match the behavior of other...