Sam Ade Jacobs

Results 18 comments of Sam Ade Jacobs

We acknowledge your solution is correct, which is that zero.init is only needed for ZeRO stage 3. This can be disabled by setting `enabled` flag to false. This will make...

@AnthoJack, batch size of 4 may be too small to observe speedup from parallel training. As for model accuracy, we test the repro provided, our current observation is that Deepspeed...

@amaarora, to disable hybrid engine, remove _--enable_hybrid_engine_ flag from training script.

> @samadejacobs I'm glad to see this pr will be merged soon. When are you going to support sdpa in the future? It's useful for me. @glowwormX, future support would...

> @samadejacobs anything I can do to help get this merged? @ArthurZucker, many thanks, please see my earlier response.

@Xirid, ZeRO stage 3 is currently not supported in DeepSpeed long context parallelism (Ulyesses). ZeRO3 support is on our roadmap, contributions are welcome!

@glowwormX, to be clear Z3 is supported with Megatron DeepSpeed client, support for HF client is on our roadmap, no ETA at this point, contributions are welcome!