Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Is there any script for pretraining/funting Bloom?
Specially, I am looking a script with Deepspeed PP and ZeRO-DP like this https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/bitfit#deepspeed-pp-and-zero-dp
In my understanding, this script should be able to load bloom with some change, for example add "--position-embedding-type alibi" . I have done some experiment, but it keeps failing.
Really appreciated it if someone could give me some advice!