tutorials Add compiled autograd tutorial

Fixes #3034

Description

Add tutorial for a PyTorch 2.4 feature

Checklist

[x] The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
[x] Only one issue is addressed in this pull request
[x] Labels from the issue that this PR is fixing are added to this pull request
[x] No unnecessary issues are included into this pull request.

Sep 03 '24 16:09 xmfan

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3026

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 14c74990f98fcd84641cc0a6831d745f446d03b7 with merge base 19fffdae0898df300f84dce911099bd589722336 (): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Sep 03 '24 16:09 pytorch-bot[bot]

There is still some issue on line 54 that prevents the tutorial from building. Can you check?

Sep 06 '24 22:09 svekars

Hi @svekars, how can I merge the PR?

Sep 23 '24 19:09 xmfan

@xmfan can you add a customcarditem and an entry in a toctree in the index.rst

Sep 23 '24 20:09 svekars

Hi, I am concerned about CompiledAutograd's progress. After a series of DDP-related optimization:

[DDP] Use compiled_autograd to trace DDP backward allreduce #110662
[DDP][PT2D] Allreduce fusion fx pass using concat and all_reduce_coalesced #113209

Does the model have compute-computation overlap when enabled CompiledAutograd? Is there any latest progress related to DDP for CompiledAutograd? Now can the multi-card model with CompiledAutograd perform better compared with torch.compile using DDPOptimizer? Thanks!

Sep 24 '24 03:09 yitingw1

@yitingw1 When enable CompiledAutograd, we should also enable the new CompiledDDP. Right now it is not automatically enabled. As for the overlapping, the answer is yes if the new CompiledDDP is enabled. The overlapping strategy is the same as the DDPOptimizer and the eager DDP. The difference from the DDPOptimizer is that the new CompiledDDP, with the help from CompiledAutograd, should produce less if not zero graph breaks. However, whether it will perform better or not depending on the models.

Sep 25 '24 00:09 fegin

@svekars can we do that in a separate PR? we don't even have a compile section right now. and I'd like to add in the torch.compile tutorial as well as the TORCH_LOGS one

Sep 25 '24 22:09 xmfan

New description looks good to me!

Oct 09 '24 17:10 xmfan