RWKV-LM icon indicating copy to clipboard operation
RWKV-LM copied to clipboard

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, sa...

Results 109 RWKV-LM issues
Sort by recently updated
recently updated
newest added

Guidance is a high-level wrapper library that embeds prompt parameters through natural languages, and guidance combines a versatile of techniques that are able to spit out exact output schema to...

Just added fixed import of deepspeed, which causes error later

Hi all, I have trained RWKV-v4neo from scratch. After going through some issues, it seems that I need to execute run.py in RWKV-v4 to test my model. I changed the...

感谢您的工作,我借鉴您的RWKV结构在尝试实现一种多模态的VLM。 这里我令RWKV充当了类似于Qformer一样的结构,也就是ViT->RWKV->LLM这样的结构,然后按照pretrain&SFT两个阶段做训练,使用了DeepSpeed。 但是在反向传播的过程中出现了许多问题,主要有两个。 1. 基于custom cuda kernel进行forward和backward的时候,pretrain阶段没有问题(冻结ViT和LLM,只训练位于projector layer的RWKV),但是在SFT阶段解冻LLM后,会一直报错CUDA ERROR: an illegal memory access was encountered,出错的位置不固定,但是可以确定这个错误是在反向传播过程中出现的,如果我注释掉rwkv模块则不会有问题(只保留线性层转换维度进行forward); 2. 不使用custom cuda kernel,使用rwkv_linear_attention_cpu函数进行forward(虽然这个函数是为CPU执行实现的,但是我的理解这个函数实际上实现了rwkv的核心机制的运算,而且只要key的device是cuda,那么其实这些运算还是在GPU上进行的)。但利用这个函数的问题在于,在batch的样本forward完毕后,backward过程会无限等待timeout(多卡情形下才会卡死,我怀疑是不是多卡的梯度聚合有问题,单卡的话利用这个函数是可以正常训练的)。 pengbo大佬可以给一些反馈和潜在的分析吗?LLM本身只有1.8B,batchsize也比较小,我也监测过,A10080G显卡,显存并没有超限。 P.S.: 对于基于cutsom cuda kernel的forward和backward,还有小概率出现这种错误: File python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward...

This document is generated using: https://github.com/james4ever0/prometheous You can view the website at: https://james4ever0.github.io/RWKV-LM/

大佬,我用v5版本的训练跑了一个小模型出来,发现v5文件夹下没有推理的代码,就用v4neo的代码和chatrwkv的chat代码试了一下,发现报错这些信息 Loading... RWKV_HEAD_QK_DIM 0 RWKV_JIT_ON 1 loading... /workspace/RWKV-LM/RWKV-v5/model/0.1-1/rwkv-50 emb.weight float16 cpu blocks.0.ln1.weight float16 cuda:0 blocks.0.ln1.bias float16 cuda:0 blocks.0.ln2.weight float16 cuda:0 blocks.0.ln2.bias float16 cuda:0 blocks.0.ln0.weight float16 cuda:0 blocks.0.ln0.bias float16 cuda:0 blocks.0.att.time_mix_k...

While finetuning RWKV, I use this script(using demo dataset by `make_data.py` and put `demo.bin` and `demo.idx` in `./data`): ``` #!/bin/bash BASE_NAME="model/demo" N_LAYER="12" N_EMBD="768" M_BSZ="16" # takes 16G VRAM (reduce this...

(rwkv5_py310) root@autodl-container-f97d11abac-813971fc:~/autodl-tmp/RWKV-LM-main/RWKV-v5# ./demo-training-run.sh INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb972vb43 INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb972vb43/_remote_module_non_scriptable.py INFO:pytorch_lightning.utilities.rank_zero:########## work in progress ########## /root/miniconda3/envs/rwkv5_py310/lib/python3.10/site-packages/pydantic/_internal/_config.py:321: UserWarning: Valid config keys have changed in V2: * 'allow_population_by_field_name' has been renamed...

also fixed a bug due to this in `MishGLU`