Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
a branch combining layer-norm-auto-sync and ds_ckpt_reshape
as we have 2 branches that aren't ready for the main yet, but we need both of them - this branch is a merge of the 2 - so will use that one for production run for now so that it'll be ready for checkpoint reshaping when we need it to.
I merged https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/272/ into https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/239