TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

FSDP2 Deadlock with fp8_autocast

Open cassanof opened this issue 6 months ago • 1 comments

Using FSDP2 with fp8_autocast seems to deadlock at the first forward pass through a te.Linear. By removing the autocast, we no longer get a deadlock.

Using:

  • PyTorch version: 2.7.0.dev20250305+cu126
  • TE version: 2.1.0+8eb1712

FYI: The FSDP2 test in the transformer engine codebase does not apply fp8_autocast. It creates an fp8_recipe but does not use it.

Thanks!

cassanof avatar Apr 21 '25 07:04 cassanof