Stas Bekman
Stas Bekman
Please see: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/308
The original Meg-DS checkpoint is here: https://huggingface.co/bigscience/bloom-optimizer-states
> is there an equivalent checkpoint for inference that is in the Meg-DS format, https://huggingface.co/bigscience/bloom-optimizer-states is the full Meg-DS checkpoint. edit: hmm, I think you're correct it's incomplete. I will...
@asafkar, so it looks like I created the new repo for nothing, the https://huggingface.co/bigscience/bloom-optimizer-states was already the full checkpoint. Why did you say it only had optim state files and...
- DS-Inference = TP - DS-ZeRO = TP-like - Accelerate = PP - Megatron-Deepspeed = TP+PP (plus DP in all)
Could you be a bit more specific, Iz? To run Meg-DS training? I have the more or less ready AWS image I created for the CI - but I'm definitely...
The only problem with this pre-made image is that our components are in flax - e.g. we get fixes in the deepspeed repo, Meg-DS gets changed too and so are...
@philschmid, could you please help me to make this image we made for Megatron-Deepspeed CI somehow available to the wider group? Basically anybody at BigScience. I'm not sure if we...
That would be fantastic! Thank you, Philipp! I think a few small tweaks will be needed to the last one I created. As the latter was done for CI and...
@ibeltagy, is this going to be used on EC2 on user's personal account or some HF account or else?