TransformerEngine FSDP2 Deadlock with fp8

FSDP2 Deadlock with fp8_autocast

Open cassanof opened this issue 7 months ago • 1 comments

Using FSDP2 with fp8_autocast seems to deadlock at the first forward pass through a te.Linear. By removing the autocast, we no longer get a deadlock.

Using:

PyTorch version: 2.7.0.dev20250305+cu126
TE version: 2.1.0+8eb1712

FYI: The FSDP2 test in the transformer engine codebase does not apply fp8_autocast. It creates an fp8_recipe but does not use it.

Thanks!

Apr 21 '25 07:04 cassanof

@denera could you take a look?

Apr 25 '25 00:04 ptrendx

@ptrendx

This test indeed appears to be broken. I've submitted #2105 to hopefully fix this.

Aug 24 '25 07:08 ntenenz

TransformerEngine TransformerEngine copied to clipboard

FSDP2 Deadlock with fp8_autocast

TransformerEngine
TransformerEngine copied to clipboard