mup icon indicating copy to clipboard operation
mup copied to clipboard

Increasing coord check for the network output

Open AkshitaB opened this issue 2 years ago • 2 comments

I'm implementing muP for the OLMo model, and am facing an issue with the coordinate check.

sp_trsfmr_adamw_coord μp_trsfmr_adamw_coord

The increasing l1 is for the network output. Following the docs, I also set readout init and query init to zero. I also ensure that the initialization is applied after set_base_shapes is called.

What other things can I check to debug the issue?

AkshitaB avatar Apr 11 '24 20:04 AkshitaB

hi @AkshitaB , im reproducing MuP too these days. can you share the arch ?? or have you solved the problem?

SeunghyunSEO avatar Jun 21 '24 06:06 SeunghyunSEO

@AkshitaB (very delayed reply but still might be helpful)

From my experience, I also tried query/readout zero-init and it didn't help. However, what I saw is that while growing at early iterations, the readout norms do stabilise across widths after a sufficient number of iterations (like 30). You might actually already see such hints on your plot for t=4, so maybe running coordinate check for longer steps will flatten your readout norms.

But even if not, it's never been a problem for me in practice to have muTransfer, most importantly is that the other layer norms looks flat, which is the case for you :)

ofivite avatar Jun 21 '24 11:06 ofivite