accelerate
accelerate copied to clipboard
TE convert model with deferred initialization
This PR adds a memory efficient way of converting models with Transformer Engine via lazy weight initialization. Transformer Engine added Deferred Initialization here (https://github.com/NVIDIA/TransformerEngine/pull/596). Pulling this into convert_model function. Loading large models directly to memory results in OOMs especially in FSDP trainings workflows. This avoids initialization of models before being passed into an FSDP wrapper.
Review
- Fully-Sharded Data Parallism: @SunMarc @zach-huggingface