mup icon indicating copy to clipboard operation
mup copied to clipboard

MuP for RNNs

Open norikazu99 opened this issue 1 year ago • 0 comments

Hello, Your paper seems to have covered linear layers, convs, and transformers but not rnns. Was it just to reduce the number of experiments or is their a more fundamental reason behind this choice. If it was just to reduce n_experiments, how would h0 be handeled? Would you recommend zeroing out h0, or it needs to be initialized using mup.init.normal.

Thank you.

norikazu99 avatar Jul 26 '24 02:07 norikazu99