Oleg Filatov
Results
3
comments of
Oleg Filatov
I was looking into the muP implementation in `gpt-neox` to contrast it with the `Megatron-LM` setup and accidentally found this issue :) I am thinking, could LR schedule be the...
@marcobellagente93 Oh yes, now it's indeed nicely flat curves, great ! :)
@AkshitaB (very delayed reply but still might be helpful) From my experience, I also tried query/readout zero-init and it didn't help. However, what I saw is that while growing at...