marcobellagente93
Results
2
issues of
marcobellagente93
Current MuP implementation in neox is buggy. This PR allows to get the main functionalities without major changes to the code. Current limitations: - only supports non-tied models - does...
Adding the `width_mult` key to the MuAdam state dictionary to make it more easy to use the class, e.g. to enable its correct use in https://github.com/EleutherAI/gpt-neox