torchtitan
torchtitan copied to clipboard
Implement fast checkpoint path
This PR uses shared memory to do async checkpoint on another process and also implements async staging (overlapping staging with the next iteration).