TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

Cpu reload double buffer

Open sanandaraj5597 opened this issue 8 months ago • 0 comments

Description

Added a feature to implement double buffer while reloading activations from CPU to GPU.

This helps reduce memory fragmentation when using CPU offloading close to GPU peak memory.

Note that this feature works only when you have symmetrical modules across sync functions (LLM training is the main use case, not DiT or Multi-Modal!)

sanandaraj5597 avatar Apr 17 '25 16:04 sanandaraj5597