RWKV-LM icon indicating copy to clipboard operation
RWKV-LM copied to clipboard

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, sa...

Results 109 RWKV-LM issues
Sort by recently updated
recently updated
newest added

#!/bin/bash BASE_NAME="./model/models--RWKV--HF_v5-Eagle-7B/snapshots/bb01ae9434eb9f4934c1ebe486eb7d3e25883d72/pytorch_model.bin" N_LAYER="32" N_EMBD="4096" M_BSZ="16" # takes 16G VRAM (reduce this to save VRAM) LR_INIT="1e-5" LR_FINAL="1e-5" GRAD_CP=0 # set to 1 to save VRAM (will be slower) EPOCH_SAVE=10 # magic_prime...

Dear developers, I want to use ROUGE to benchmark RWKV-v4 in some summarization tasks. Is it suitable? model: RWKV-4-Pile-1B5-20220903-8040

How to train RWKV-5-World-1B5-v2 model

Seems like the current RWKVWorldTokenizer on HuggingFace does not do truncation, even though I set truncation=True. Is this a deliberate decision? Is there some other way to do input truncation?

It's possible to run RWKV-5 World on a colab or is too big? If yes, there are some examples?

As it said in the origin paper: "Token Shift allows the model to learn how much new versus old information should be allocated per time step to each channel of...

Can you intuitively explain what `ratio_0_to_1` is doing in `RWKV_Tmix_x060`? https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/src/model.py#L290 I find that `ratio_0_to_1` is defined by: `ratio_0_to_1 = layer_id / (args.n_layer - 1)` Then it defines multiple things...

你好,我想使用RWKV用于语音增强,请问如何实现RWKV替换模型中的RNN部分

https://github.com/BlinkDL/RWKV-LM/blob/666f64591e13c68ed6e602e957c5ca47b25750e3/RWKV-v5/cuda/wkv6state_cuda.cu#L15 This line is missing the batch offset and should read: `_s += b*H*_N_*_N_ + h*_N_*_N_ + i*_N_;` Probably why this code didn't work for BPTT when we tried it...