TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

FSDP2 Deadlock with fp8_autocast

Open cassanof opened this issue 7 months ago • 1 comments

Using FSDP2 with fp8_autocast seems to deadlock at the first forward pass through a te.Linear. By removing the autocast, we no longer get a deadlock.

Using:

  • PyTorch version: 2.7.0.dev20250305+cu126
  • TE version: 2.1.0+8eb1712

FYI: The FSDP2 test in the transformer engine codebase does not apply fp8_autocast. It creates an fp8_recipe but does not use it.

Thanks!

cassanof avatar Apr 21 '25 07:04 cassanof

@denera could you take a look?

ptrendx avatar Apr 25 '25 00:04 ptrendx

@ptrendx

This test indeed appears to be broken. I've submitted #2105 to hopefully fix this.

ntenenz avatar Aug 24 '25 07:08 ntenenz