torchtitan
torchtitan copied to clipboard
FSDP 2 doesn't pad tensors?
Hi, I ran my model with FSDP 2, one of the linear layers has a dim that's not divisible by the world size (128), and so I got the following error:
torch.Size([...]) is not divisible by FSDP world size 128.
FSDP 1 circumvents this issue by padding the tensors. Is this not supported by FSDP 2? If not, will it be supported?