Olatunji Ruwase
Olatunji Ruwase
@mbetser, thanks for reporting this error. Can you please share a simple script and steps to reproduce this issue?
@xylian86, can you please help with this?
@Griffintaur, can you please see if this new API can help? https://github.com/microsoft/DeepSpeed/pull/4966
@Liangliang-Ma, apologies for delay. I am still thinking about your last comment, but will not delay this PR.
@torshie, thanks for the update. We have only tested cpu-offload with zero stage 2, but not with stage 1. I hope you zero stage 2 can work for your scenario,...
@nelyahu, I was unaware, so thanks for bring this to my attention.
@dogacancolak-kensho, you need to create a PR to be reviewed in order to merge your changes. Contributors cannot push directly into main branch as standard practice.
> i did offline debugging of those failure and improved the code change so it will pass @nelyahu, it great that you narrowed this down. Do you think a unit...
Zero-inference is composable with Megatron-style TP. That is the TP is implemented in the client.
I assume you are referring to [kv cache offloading](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/huggingface/zero_inference) in the latest zero-inference. We did not evaluate with TP, but I expect it should work.