Sam Ade Jacobs
Sam Ade Jacobs
We acknowledge your solution is correct, which is that zero.init is only needed for ZeRO stage 3. This can be disabled by setting `enabled` flag to false. This will make...
User error, closing.
@AnthoJack, batch size of 4 may be too small to observe speedup from parallel training. As for model accuracy, we test the repro provided, our current observation is that Deepspeed...
Might be a bug of hibrid engine : In Step3 wrong generation secquence when hibrid engine is enabled.
@amaarora, to disable hybrid engine, remove _--enable_hybrid_engine_ flag from training script.
> @samadejacobs I'm glad to see this pr will be merged soon. When are you going to support sdpa in the future? It's useful for me. @glowwormX, future support would...
> @samadejacobs anything I can do to help get this merged? @ArthurZucker, many thanks, please see my earlier response.
@Xirid, ZeRO stage 3 is currently not supported in DeepSpeed long context parallelism (Ulyesses). ZeRO3 support is on our roadmap, contributions are welcome!
@glowwormX, to be clear Z3 is supported with Megatron DeepSpeed client, support for HF client is on our roadmap, no ETA at this point, contributions are welcome!