Griffin
Griffin copied to clipboard
[BUG] Calculation of alpha and initialization of lambda incorrect
Hello,
I noticed two deviations from the Griffin paper in your code.
Lambda
Here, Lambda is initialized as: https://github.com/kyegomez/Griffin/blob/83bbfdd9b0698cc27c19439ec16fb4fce07436c9/griffin_torch/main.py#L39-L42 However, the Griffin paper states in the second part of chapter 2.4:
We initialize Λ such that a^c is uniformly distributed between 0.9 and 0.999 at the start of training,
and a = sigmoid(Λ).
So actually, the initialization for Lambda should be calculated as Λ = -log((1 / a^(1/c)) - 1)
with a uniformly between 0.9 and 0.999.
Alpha
And second a_t is defined in the paper (equation 3) as:
a_t = a^(c*r_t)
which is in nowhere near the formula in the code: https://github.com/kyegomez/Griffin/blob/83bbfdd9b0698cc27c19439ec16fb4fce07436c9/griffin_torch/main.py#L62-L63
And also, the implementation should follow the recommendation from Appendix A (Implementation) in the paper (equation 6), to implement this operation in log-space:
a_t = exp(-c*softplus(-Λ) ⊙ r_t)
Note, that the formula in the paper is missing the minus before the Lambda, but you can easily check, that it should be there yourself: https://www.wolframalpha.com/input?i=exp%28-8log%281%2Bexp%28-l%29%29%29+%3D+sigmoid%28l%29%5E8 or for general c: https://www.wolframalpha.com/input?i=exp%28-clog%281%2Bexp%28-l%29%29%29+%3D+sigmoid%28l%29%5Ec