torchtitan icon indicating copy to clipboard operation
torchtitan copied to clipboard

FSDP 2 doesn't pad tensors?

Open cassanof opened this issue 10 months ago • 4 comments

Hi, I ran my model with FSDP 2, one of the linear layers has a dim that's not divisible by the world size (128), and so I got the following error:

torch.Size([...]) is not divisible by FSDP world size 128.

FSDP 1 circumvents this issue by padding the tensors. Is this not supported by FSDP 2? If not, will it be supported?

cassanof avatar Dec 29 '24 21:12 cassanof