RWKV-LM issues

Results 109 RWKV-LM issues

Sort by recently updated

Add to guidance https://github.com/microsoft/guidance/tree/main/guidance/llms/transformers

Guidance is a high-level wrapper library that embeds prompt parameters through natural languages, and guidance combines a versatile of techniques that are able to spit out exact output schema to...

yhyu13

Fix import missing in train.py

Just added fixed import of deepspeed, which causes error later

RichardErkhov

UTF-16 stream does not start with BOM

Hi all, I have trained RWKV-v4neo from scratch. After going through some issues, it seems that I need to execute run.py in RWKV-v4 to test my model. I changed the...

ItsCRC

请问huggingface transformers的库实现的RWKV是否有些问题？我在backward的时候总是出现问题。

感谢您的工作，我借鉴您的RWKV结构在尝试实现一种多模态的VLM。这里我令RWKV充当了类似于Qformer一样的结构，也就是ViT->RWKV->LLM这样的结构，然后按照pretrain&SFT两个阶段做训练，使用了DeepSpeed。但是在反向传播的过程中出现了许多问题，主要有两个。 1. 基于custom cuda kernel进行forward和backward的时候，pretrain阶段没有问题（冻结ViT和LLM，只训练位于projector layer的RWKV），但是在SFT阶段解冻LLM后，会一直报错CUDA ERROR: an illegal memory access was encountered，出错的位置不固定，但是可以确定这个错误是在反向传播过程中出现的，如果我注释掉rwkv模块则不会有问题（只保留线性层转换维度进行forward）； 2. 不使用custom cuda kernel，使用rwkv_linear_attention_cpu函数进行forward（虽然这个函数是为CPU执行实现的，但是我的理解这个函数实际上实现了rwkv的核心机制的运算，而且只要key的device是cuda，那么其实这些运算还是在GPU上进行的）。但利用这个函数的问题在于，在batch的样本forward完毕后，backward过程会无限等待timeout（多卡情形下才会卡死，我怀疑是不是多卡的梯度聚合有问题，单卡的话利用这个函数是可以正常训练的）。 pengbo大佬可以给一些反馈和潜在的分析吗？LLM本身只有1.8B，batchsize也比较小，我也监测过，A10080G显卡，显存并没有超限。 P.S.: 对于基于cutsom cuda kernel的forward和backward，还有小概率出现这种错误： File python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward...

Youngluc

Create a search based document for RWKV-LM

This document is generated using: https://github.com/james4ever0/prometheous You can view the website at: https://james4ever0.github.io/RWKV-LM/

James4Ever0

'types.SimpleNamespace' object has no attribute 'time_first'

大佬，我用v5版本的训练跑了一个小模型出来，发现v5文件夹下没有推理的代码，就用v4neo的代码和chatrwkv的chat代码试了一下，发现报错这些信息 Loading... RWKV_HEAD_QK_DIM 0 RWKV_JIT_ON 1 loading... /workspace/RWKV-LM/RWKV-v5/model/0.1-1/rwkv-50 emb.weight float16 cpu blocks.0.ln1.weight float16 cuda:0 blocks.0.ln1.bias float16 cuda:0 blocks.0.ln2.weight float16 cuda:0 blocks.0.ln2.bias float16 cuda:0 blocks.0.ln0.weight float16 cuda:0 blocks.0.ln0.bias float16 cuda:0 blocks.0.att.time_mix_k...

legends-7

AssertionError while finetuning RWKVv5

While finetuning RWKV, I use this script(using demo dataset by `make_data.py` and put `demo.bin` and `demo.idx` in `./data`): ``` #!/bin/bash BASE_NAME="model/demo" N_LAYER="12" N_EMBD="768" M_BSZ="16" # takes 16G VRAM (reduce this...

Ethan-Chen-plus

Fix readme typo

JorgeCepeda

微调IndexError: list index out of range

(rwkv5_py310) root@autodl-container-f97d11abac-813971fc:~/autodl-tmp/RWKV-LM-main/RWKV-v5# ./demo-training-run.sh INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb972vb43 INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb972vb43/_remote_module_non_scriptable.py INFO:pytorch_lightning.utilities.rank_zero:########## work in progress ########## /root/miniconda3/envs/rwkv5_py310/lib/python3.10/site-packages/pydantic/_internal/_config.py:321: UserWarning: Valid config keys have changed in V2: * 'allow_population_by_field_name' has been renamed...

aolerv

[RWKV-v5] use register_buffer instead of frozen params

also fixed a bug due to this in `MishGLU`

kashif

RWKV-LM
RWKV-LM copied to clipboard

Metadata

Add to guidance https://github.com/microsoft/guidance/tree/main/guidance/llms/transformers

Fix import missing in train.py

UTF-16 stream does not start with BOM

请问huggingface transformers的库实现的RWKV是否有些问题？我在backward的时候总是出现问题。

Create a search based document for RWKV-LM

'types.SimpleNamespace' object has no attribute 'time_first'

AssertionError while finetuning RWKVv5

Fix readme typo

微调IndexError: list index out of range

[RWKV-v5] use register_buffer instead of frozen params

← Metadata

Owner

Metadata

RWKV-LM RWKV-LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

RWKV-LM
RWKV-LM copied to clipboard