RWKV-LM issues

Results 109 RWKV-LM issues

Sort by recently updated

fintune RWKV5-7B Missing key(s) in state_dict:

#!/bin/bash BASE_NAME="./model/models--RWKV--HF_v5-Eagle-7B/snapshots/bb01ae9434eb9f4934c1ebe486eb7d3e25883d72/pytorch_model.bin" N_LAYER="32" N_EMBD="4096" M_BSZ="16" # takes 16G VRAM (reduce this to save VRAM) LR_INIT="1e-5" LR_FINAL="1e-5" GRAD_CP=0 # set to 1 to save VRAM (will be slower) EPOCH_SAVE=10 # magic_prime...

liuao743

Can RWKV-v4 handle summarization tasks?

Dear developers, I want to use ROUGE to benchmark RWKV-v4 in some summarization tasks. Is it suitable? model: RWKV-4-Pile-1B5-20220903-8040

zzczzc20

Finetuning RWKV-5-World-1B5-v2 model

How to train RWKV-5-World-1B5-v2 model

ArchanaNarayanan843

Truncation in Tokenizer?

Seems like the current RWKVWorldTokenizer on HuggingFace does not do truncation, even though I set truncation=True. Is this a deliberate decision? Is there some other way to do input truncation?

sedrick-keh-tri

RWKV-5 World on colab

It's possible to run RWKV-5 World on a colab or is too big? If yes, there are some examples?

EnricoBeltramo

How to understand u vector in the origin paper?

As it said in the origin paper: "Token Shift allows the model to learn how much new versus old information should be allocated per time step to each channel of...

141forever

Zero-division error when args.n_layer = 1, caused by ratio_0_to_1. Can I set ratio_0_to_1 = 0 when n_layer = 1?

Can you intuitively explain what `ratio_0_to_1` is doing in `RWKV_Tmix_x060`? https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/src/model.py#L290 I find that `ratio_0_to_1` is defined by: `ratio_0_to_1 = layer_id / (args.n_layer - 1)` Then it defines multiple things...

zdxdsw

RWKV替换模型中的RNN

你好，我想使用RWKV用于语音增强，请问如何实现RWKV替换模型中的RNN部分

hulucky1102

bug in new wkv6state_cuda

https://github.com/BlinkDL/RWKV-LM/blob/666f64591e13c68ed6e602e957c5ca47b25750e3/RWKV-v5/cuda/wkv6state_cuda.cu#L15 This line is missing the batch offset and should read: `_s += b*H*_N_*_N_ + h*_N_*_N_ + i*_N_;` Probably why this code didn't work for BPTT when we tried it...

SmerkyG

RWKV-LM
RWKV-LM copied to clipboard

Metadata

fintune RWKV5-7B Missing key(s) in state_dict:

Can RWKV-v4 handle summarization tasks?

Finetuning RWKV-5-World-1B5-v2 model

Truncation in Tokenizer?

RWKV-5 World on colab

How to understand u vector in the origin paper?

Zero-division error when args.n_layer = 1, caused by ratio_0_to_1. Can I set ratio_0_to_1 = 0 when n_layer = 1?

RWKV替换模型中的RNN

bug in new wkv6state_cuda

← Metadata

Owner

Metadata

RWKV-LM RWKV-LM copied to clipboard

Metadata

← Metadata

Owner

Metadata

RWKV-LM
RWKV-LM copied to clipboard