TransformerEngine
TransformerEngine copied to clipboard
FSDP2 Deadlock with fp8_autocast
Using FSDP2 with fp8_autocast seems to deadlock at the first forward pass through a te.Linear.
By removing the autocast, we no longer get a deadlock.
Using:
- PyTorch version:
2.7.0.dev20250305+cu126 - TE version:
2.1.0+8eb1712
FYI: The FSDP2 test in the transformer engine codebase does not apply fp8_autocast. It creates an fp8_recipe but does not use it.
Thanks!