RWKV-LM icon indicating copy to clipboard operation
RWKV-LM copied to clipboard

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, sa...

Results 109 RWKV-LM issues
Sort by recently updated
recently updated
newest added

Mixtral 8x7b is out. Is there any plan for RWKV to support MoE in the future, with inference speedup? Looking forward to it.

deepspeed==0.7.0 pytorch-lightning==1.9.2 torch 1.13.1+cu117 一样的版本; Traceback (most recent call last): File "summarization_pipeline.py", line 1382, in main() File "summarization_pipeline.py", line 1376, in main train_ds(configs) File "summarization_pipeline.py", line 1040, in train_ds trainer.run(model=model,...

torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1+cu118 deepspeed 0.12.4 pytorch-lightning 2.1.2 提示报错: AttributeError: "MyDataset' object has no attribute 'global rank'

作者可以给一个 requirements?

似乎并没有看到有人有人将rwkv或者retnet用于ocr任务,对于较长的文本,例如2048或者4096而言,解码是一个成本较高的事情,但是如果将解码器换成rwkv那么对于长度、成本和速度都是一个非常好的解决方案。但是我查找了一些资料,并没有看到有人这样做,我尝试这样做但是没能理解用法 是否愿意出一个解码器教程或者帮我重构一下代码,我相信rwkv在ocr领域应该是一个冉冉升起的新星

I ran this code below and: wkv_cuda = load(name="wkv", sources=["cuda/wkv_op.cpp", "cuda/wkv_cuda.cu"], verbose=True, extra_cuda_cflags=['-res-usage', '--maxrregcount 60', '--use_fast_math', '-O3', '-Xptxas -O3', f'-DTmax={T_MAX}']) got this: Traceback (most recent call last): File "d:\GitHub\S_GPT\src\model.py", line...

mitting ninja build file /home/hope/.cache/torch_extensions/py310_cu117/wkv_1024/build.ninja... Building extension module wkv_1024... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=wkv_1024 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\"...

作者您好,由于国内huggingface目前无法使用,想问一下有没有别的地方可以稳定下载模型?

如题,因为看到huggingface上有单CHNTuned和单JPNTuned的版本,想问下可否提供一个同时CHN+JPNTuned的版本呢?

显示3072和2688维度不一致,观察:一个是emb乘3.5 另一个是乘4 于是手动将--dim_ffn改为3072,但是: `RuntimeError: Error(s) in loading state_dict for RWKV: Missing key(s) in state_dict: "blocks.0.att.time_mix_g", "blocks.0.att.time_faaaa", "blocks.0.att.gate.weight", "blocks.1.att.time_mix_g", "blocks.1.att.time_faaaa", "blocks.1.att.gate.weight", "blocks.2.att.time_mix_g", "blocks.2.att.time_faaaa", "blocks.2.att.gate.weight", "blocks.3.att.time_mix_g", "blocks.3.att.time_faaaa", "blocks.3.att.gate.weight", "blocks.4.att.time_mix_g", "blocks.4.att.time_faaaa", "blocks.4.att.gate.weight", "blocks.5.att.time_mix_g",...