Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
WIP: Shared t5 code
requires: https://github.com/microsoft/DeepSpeed/pull/2035
TODO:
- [x] Make sure we can run shared enc/dec with MLM
- [x] Add test making sure that it runs. with MLM
- [ ] Make sure we can load a BLOOM 6b model in a single A100 GPU
- [ ] Make sure we can load a BLOOM model checkpoint (+ checkpoing manipulation script to share)