mayukh-stackav
Results
2
issues of
mayukh-stackav
When attempting to convert large models (e.g., Llama-405) to use transformer_engine layers via the convert_model function, I'm encountering out-of-memory (OOM) errors. This seems to happen because the current implementation keeps...
This PR adds a memory efficient way of converting models with Transformer Engine via lazy weight initialization. Transformer Engine added Deferred Initialization here (https://github.com/NVIDIA/TransformerEngine/pull/596). Pulling this into convert_model function. Loading...