Olatunji Ruwase

Results 559 comments of Olatunji Ruwase

@buttercutter, I think the issue is that ZeRO does not optimize the memory consumption of activations. Can you try running with a `train_micro_batch_size_per_gpu` of 1 to test this? Also, assuming...

@buttercutter, you might try activation checkpointing to address this if the smaller micro batch size works. Here are some docs 1. PyTorch: https://pytorch.org/docs/stable/checkpoint.html 2. DeepSpeed: https://deepspeed.readthedocs.io/en/latest/activation-checkpointing.html

@buttercutter, yes this is not surprising as it is indicating a mismatch between the batch size passed to client script via command line and the batch size in the ds_config....

@buttercutter, thanks for the update. No, we are not solving the problem by reducing the batch_size rather we are trying to confirm whether memory bloat is one that ZeRO is...

@buttercutter, yes I recommend DeepSpeed's activation checkpointing because if supports offloading the activation inputs to cpu memory. In terms of enabling it, there are two parts involved. 1. Wrap the...

@delock, apologies for the delay. We are still iterating on our thoughts and will sync with you asap. As you might notice, we have linked PR that builds on yours....

@delock, I notice you merged #2320. Is the PR already working correctly for you? Thanks!

> > > > We merged the class definition and now we are modifying all accel_runtime and literal_device call site to use get_accelerator(). We are still testing internally before we...

@stas00, I don't intend for this to be merged. Rather, I am sharing this PR to get your feedback for the generalization effort. As discussed earlier, the core logic will...

@stas00, thanks for helping with this issue. Yes, documentation is on my TODO. I think linking to your recipe is a great solution for now.