Asaf Karnieli
Asaf Karnieli
Hi @stas00, Besides the optimizer states, is there an equivalent checkpoint for inference that is in the Meg-DS format, which I can use with deepspeed inference, in order to run...
@mayank31398 I was actually referring to the other way around - i.e. I want to do ds inference, while using pipeline parallelism. For that to work (if I understand correctly),...
@stas00 I haven't checked the https://huggingface.co/bigscience/bloom-optimizer-states repo yet, I was merely asking whether it will support what I'm trying to do - I'm sorry if I wasn't clear enough about...
Regarding Bloom models, downgrading deepspeed to 0.7.6 works for me. Using 0.7.7 / 0.8.0 gets this error (using this script - https://github.com/huggingface/transformers-bloom-inference/blob/main/bloom-inference-scripts/bloom-ds-inference.py)
> @RezaYazdaniAminabadi apologies I spoke too soon... it's now working for BLOOM 175B with the pre-sharded fp16 weights, but not the original `.bin` checkpoint shards (which do work with 0.7.6)....