RWKV-LM
RWKV-LM copied to clipboard
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, sa...
Mixtral 8x7b is out. Is there any plan for RWKV to support MoE in the future, with inference speedup? Looking forward to it.
deepspeed==0.7.0 pytorch-lightning==1.9.2 torch 1.13.1+cu117 一样的版本; Traceback (most recent call last): File "summarization_pipeline.py", line 1382, in main() File "summarization_pipeline.py", line 1376, in main train_ds(configs) File "summarization_pipeline.py", line 1040, in train_ds trainer.run(model=model,...
torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1+cu118 deepspeed 0.12.4 pytorch-lightning 2.1.2 提示报错: AttributeError: "MyDataset' object has no attribute 'global rank'
作者可以给一个 requirements?
似乎并没有看到有人有人将rwkv或者retnet用于ocr任务,对于较长的文本,例如2048或者4096而言,解码是一个成本较高的事情,但是如果将解码器换成rwkv那么对于长度、成本和速度都是一个非常好的解决方案。但是我查找了一些资料,并没有看到有人这样做,我尝试这样做但是没能理解用法 是否愿意出一个解码器教程或者帮我重构一下代码,我相信rwkv在ocr领域应该是一个冉冉升起的新星
I ran this code below and: wkv_cuda = load(name="wkv", sources=["cuda/wkv_op.cpp", "cuda/wkv_cuda.cu"], verbose=True, extra_cuda_cflags=['-res-usage', '--maxrregcount 60', '--use_fast_math', '-O3', '-Xptxas -O3', f'-DTmax={T_MAX}']) got this: Traceback (most recent call last): File "d:\GitHub\S_GPT\src\model.py", line...
mitting ninja build file /home/hope/.cache/torch_extensions/py310_cu117/wkv_1024/build.ninja... Building extension module wkv_1024... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=wkv_1024 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\"...
作者您好,由于国内huggingface目前无法使用,想问一下有没有别的地方可以稳定下载模型?
如题,因为看到huggingface上有单CHNTuned和单JPNTuned的版本,想问下可否提供一个同时CHN+JPNTuned的版本呢?
显示3072和2688维度不一致,观察:一个是emb乘3.5 另一个是乘4 于是手动将--dim_ffn改为3072,但是: `RuntimeError: Error(s) in loading state_dict for RWKV: Missing key(s) in state_dict: "blocks.0.att.time_mix_g", "blocks.0.att.time_faaaa", "blocks.0.att.gate.weight", "blocks.1.att.time_mix_g", "blocks.1.att.time_faaaa", "blocks.1.att.gate.weight", "blocks.2.att.time_mix_g", "blocks.2.att.time_faaaa", "blocks.2.att.gate.weight", "blocks.3.att.time_mix_g", "blocks.3.att.time_faaaa", "blocks.3.att.gate.weight", "blocks.4.att.time_mix_g", "blocks.4.att.time_faaaa", "blocks.4.att.gate.weight", "blocks.5.att.time_mix_g",...