RWKV-LM
RWKV-LM copied to clipboard
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, sa...
I am testing RWKV-v7 0.4B for training but it seems not working like what I expected. How many memory do you use for this model or other 1.5B or 3B...
Can someone put the pictures in a subdirectories, otherwise they look a bit messy
在第一个block会多一个attn("blocks.0.att.v1", "blocks.0.att.v2", "blocks.0.att.v0".),其它层正常  
Hello, I am trying to use the RWKV4 model to process a sequential pkl dataset. However, when I use the CUDA kernel, I encounter an error'UnicodeDecodeError: 'gbk' codec can't decode...
Hello! I wonder if RWKV7 used the sequence packing strategy during pre-training? If so, do the samples need to be masked from each other?
代码中的state通过cuda文件内生成,请问为什么不需要显示存储state?
Using PyTorch's built-in fused operators, which internally utilize fp32 for forward computation, improves both speed and accuracy.
How to apply GPRO methods to further training of the rwkv model
rwkw-r1?
is there any plans of releasing a reasoner model?
The __getitem__ method did not return any value when args.data_type == "uint16", causing data loaders to receive None. Added an explicit `return x, y` to match the behavior of other...