mup
mup copied to clipboard
add width_mult to optimizer dict
Adding the width_mult
key to the MuAdam state dictionary to make it more easy to use the class, e.g. to enable its correct use in https://github.com/EleutherAI/gpt-neox