Samyam Rajbhandari
Samyam Rajbhandari
@szhengac You are correct, LAMB and LARS implementations that are not aware of ZeRO will not work correctly with ZeRO. This is not a fundamental limitation of optimizer partitioning though,...
Hi Nathan, Thank you for trying out DeepSpeed. I am a researcher in the DeepSpeed team. I wanted to share a few comments here that might be helpful: Small Models:...
@champson, @yefanhust are there specific models/scenarios you are looking to apply pipeline parallelism for. The scenarios that PP is helpful for inference is very narrow, and applicable in just a...