OLMo
OLMo copied to clipboard
MoE
Replaces https://github.com/allenai/OLMo/pull/541
Notes:
- I didn't find
norm_afterto work well but added it to conform with other parts of the code but can also remove it - Only left in the config file used for the final 5T run
- I didn't include all configurations that we ran for OLMoE (e.g. expert choice) - I will probably put instructions for those in a separate
olmoerepository for people who want to exactly reproduce
Linking this related PR that we should merge after: https://github.com/allenai/OLMo/pull/707
If this PR here looks good to you, could you approve it @epwalsh / @dirkgr ? :)
All tests are passing except the GPU test which I assume is expected to fail. Feel free to merge 😊
What's going on with this PR? Can we merge?
What's going on with this PR? Can we merge?
Fixed some basics as discussed; I think we can merge!
@dirkgr shall we merge?