Eugene Eisenstein
Eugene Eisenstein
This is a proposed fix for #3202, it makes the deepspeed.zero.Init() context ignore any nested enters, ie any additional enters that were not preceded by an exit. It also fixes...
**Describe the bug** Intuitively, the Init() context seems like it should be idempotent. It should activate model partitioning, and calling it again shouldn't have any unexpected consequences. However, currently, nesting...
Previous PR #4416 had too many issues, closing that one and re-opening. This PR includes a passing test. This is a proposal for an implementation of checkpointing models when training...