RWKV-LM icon indicating copy to clipboard operation
RWKV-LM copied to clipboard

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, sa...

Results 109 RWKV-LM issues
Sort by recently updated
recently updated
newest added

Hello, I am trying out RWKV with audio modality and when I set T_MAX>>1000, it throws this error: ``` Emitting ninja build file /root/.cache/torch_extensions/py39_cu116/timex/build.ninja... Building extension module timex... Allowing ninja...

你好作者!我对此工作很感兴趣,因为我现在在用基于transformer的模型做分类任务,transformer或者RNN在分类任务里通常采用最后一个模块的每个通道的最后一个元素作为输出,并通过全连接层映射到几个类别。 请问你觉得RWKV原理类似吗?依旧提取最后一个元素作为输出是否稳妥呢?希望您能给出一些建议,我将很感激!

Hi @BlinkDL! First off this is amazing and seems very promising for scaling down large Transformers to be more production friendly. I'm wondering if you have any benchmarks regarding VRAM...

Hi, really exciting project! I'm wondering if you've published the model conversion script that you used to create the [js_models](https://github.com/BlinkDL/AI-Writer/tree/main/docs/eng/js_model) files from the `.pth` model file? It would be *awesome*...

It is awesome and intereseting. I wonder if there is any paper about RWKV? Thanks.

Is there a training plan for this project in other languages (e.g. Japanese)?

Hi there. You mention in the readme that you're interested in potentially adding some special tokens/markers to represent stuff like capitalisation. Just wanted to let you know we tried that...

Hi, I was training the model locally from scratch. ```shell python train.py --load_model --wandb --proj_dir out --data_file ../data/enwik8 --data_type utf-8 --vocab_size 0 --ctx_len 512 --epoch_steps 5000 --epoch_count 500 --epoch_begin 0...

Hi @BlinkDL ! Really interested in your work here. I am looking to test out some of the models for embedding based tasks. What is the best way to access...