[BUG] Calculation of alpha and initialization of lambda incorrect

Open ozppupbg opened this issue 3 months ago • 3 comments

Hello,

I noticed two deviations from the Griffin paper in your code.

Lambda

Here, Lambda is initialized as: https://github.com/kyegomez/Griffin/blob/83bbfdd9b0698cc27c19439ec16fb4fce07436c9/griffin_torch/main.py#L39-L42 However, the Griffin paper states in the second part of chapter 2.4:

We initialize Λ such that a^c is uniformly distributed between 0.9 and 0.999 at the start of training,

and a = sigmoid(Λ).

So actually, the initialization for Lambda should be calculated as Λ = -log((1 / a^(1/c)) - 1) with a uniformly between 0.9 and 0.999.

Alpha

And second a_t is defined in the paper (equation 3) as:

a_t = a^(c*r_t)

which is in nowhere near the formula in the code: https://github.com/kyegomez/Griffin/blob/83bbfdd9b0698cc27c19439ec16fb4fce07436c9/griffin_torch/main.py#L62-L63

And also, the implementation should follow the recommendation from Appendix A (Implementation) in the paper (equation 6), to implement this operation in log-space: a_t = exp(-c*softplus(-Λ) ⊙ r_t)

Note, that the formula in the paper is missing the minus before the Lambda, but you can easily check, that it should be there yourself: https://www.wolframalpha.com/input?i=exp%28-8log%281%2Bexp%28-l%29%29%29+%3D+sigmoid%28l%29%5E8 or for general c: https://www.wolframalpha.com/input?i=exp%28-clog%281%2Bexp%28-l%29%29%29+%3D+sigmoid%28l%29%5Ec

Mar 21 '24 14:03 ozppupbg

Griffin Griffin copied to clipboard

[BUG] Calculation of alpha and initialization of lambda incorrect

Lambda

Alpha

Griffin
Griffin copied to clipboard