RWKV-LM icon indicating copy to clipboard operation
RWKV-LM copied to clipboard

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, sa...

Results 109 RWKV-LM issues
Sort by recently updated
recently updated
newest added

Hi, I was wondering whether this model can achieve [GPT-4](https://openai.com/research/gpt-4) level performance on the HumanEval benchmark, a proxy for effectiveness at code generation. I'm fine if I have to train...

Reduced unnecessary copying in the code by optimizing the slicing and appending operations. These changes should result in improved performance.

better project with following style guide from [PEP8 python code style](https://peps.python.org/pep-0008/), so i create some formatter configuration with [precommit](https://pre-commit.com/). for sample configuration can use: ``` repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev:...

HuggingFace -> Hugging Face

因为linux有页缓存,所以我在wsl2启动的时候load模型文件需要两倍于模型文件大小的内存,我这里有一个简单的办法解决了这个问题,就是在读取后立即告诉操作系统释放对应的内存 ```python def file_cleaner(file): last_pos = 0 def cleaner(): nonlocal last_pos print("cleaner start") while True: time.sleep(0.1) pos = file.tell() if pos > last_pos: print("cleaner clean %d to %d" % (last_pos,pos))...

In training process, I have noticed that the cuda code finished all calculation within the ctx-len, the speed is fast but seems memory-unfriendly for some application with long context length....

I saw some code under [RWKV-LM/RWKV-v4neo/src/model.py](https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v4neo/src/model.py) which requires CUDA to create RWKV model. I want to change the code by replacing the first embedding layer with a linear layer to...

Hi, I have one tiny question about the cuda kernel. In the code, `aa` and `bb` are running sums. To avoid overflow, you divided `exp(-p)` both when computing `y[ii]` and...