pytorch.github.io Wrong Code in the FSDP blog

Wrong Code in the FSDP blog

Open Luosuu opened this issue 1 year ago • 1 comments

📚 Documentation

In the blog introducing FSDP API

fsdp_model = FullyShardedDataParallel(
   model(),
   fsdp_auto_wrap_policy=default_auto_wrap_policy,
   cpu_offload=CPUOffload(offload_params=True),
)

it should be model instead of model() inside FullyShardedDataParallel

so it should be

fsdp_model = FullyShardedDataParallel(
   model,
   fsdp_auto_wrap_policy=default_auto_wrap_policy,
   cpu_offload=CPUOffload(offload_params=True),
)

May 06 '23 03:05 Luosuu

It looks a little confusing and maybe could be written more clearly, but I think that's actually correct. If it was just model, it would be trying to FSDP-wrap the DDP model. By using model(), it's FSDP wrapping a new model instance.

Nov 09 '23 20:11 nairbv

pytorch.github.io pytorch.github.io copied to clipboard

Wrong Code in the FSDP blog

📚 Documentation

pytorch.github.io
pytorch.github.io copied to clipboard