RWKV-LM Question about RWKV formula

Question about RWKV formula

Open luowyang opened this issue 1 year ago • 2 comments

In the first formula in README, RWKV is rewritten into recurrent form by letting $W_n=(n-1)w$. Is there a particular reason for using $n-1$ instead of $n$? The latter is more natural, and in From GPT to RWKV (the formulas) the recurrent formula of RWKV also implies the latter. So I believe you probably have tried it but for some reason it is suboptimal.

Apr 01 '23 13:04 luowyang

The $n-1$ format is more expressive. I only tried the current formula because I believe it's better.

Note I am treating W_0 differently.

Apr 01 '23 15:04 BlinkDL

That sounds reasonable because the $n-1$ form makes it possible for $\exp(K_i)V_i$ to appear in the expression of $O_{i+1}$. However, it would be better if someone could provide some empirical evidence. Thus I think it's better to leave this issue open for some time :-)

Apr 01 '23 16:04 luowyang

RWKV-LM RWKV-LM copied to clipboard

Question about RWKV formula

RWKV-LM
RWKV-LM copied to clipboard