Liran Bachar
Liran Bachar
When activation checkpointing with contiguous checkpoints are enabled calling consecutive eval_batch() will result in File "/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 421, in partition_activations contiguous_data_buffers[i][data_offsets[i]].data[range( IndexError: list index out of range Must call deepspeed.runtime.activation_checkpointing.reset()...
No usage of extra_large_param_to_reduce if contiguous_gradients is False. It keeps reference of the param for the lifetime of the application.
Grad tensors that don't fit in the bucket flat buffer are not added to it, but still added to params_in_ipg_bucket if such tensors exists use reduce_scatter of params_in_ipg_bucket instead of...
compile wrapper will inherit from user module class and copy it's __dict__ This should resolve most issues in #5383 except potential extra user forward hooks. @tohtana @loadams
Hi. Please review the following changes I added support for BF16 to cpu adam. BF16, FP16 and float are supported at compilation time. the correct template is called at runtime...
CompiledModuleWrapper is implemented as a wrapper class for the model. I see a few issues when running unit tests with compile enabled. 1. isinstance(self.module, PipelineModule) used in multiple places in...