Oleg Filatov

Results 3 comments of Oleg Filatov

I was looking into the muP implementation in `gpt-neox` to contrast it with the `Megatron-LM` setup and accidentally found this issue :) I am thinking, could LR schedule be the...

@AkshitaB (very delayed reply but still might be helpful) From my experience, I also tried query/readout zero-init and it didn't help. However, what I saw is that while growing at...