mayukh-stackav issues

Repositories
Issues
Comments

Results 2 issues of


                                            mayukh-stackav

Transformer Engine memory-efficient initialization to convert_model for large models

When attempting to convert large models (e.g., Llama-405) to use transformer_engine layers via the convert_model function, I'm encountering out-of-memory (OOM) errors. This seems to happen because the current implementation keeps...

TE convert model with deferred initialization

This PR adds a memory efficient way of converting models with Transformer Engine via lazy weight initialization. Transformer Engine added Deferred Initialization here (https://github.com/NVIDIA/TransformerEngine/pull/596). Pulling this into convert_model function. Loading...