Add compiled autograd tutorial
Fixes #3034
Description
Add tutorial for a PyTorch 2.4 feature
Checklist
- [x] The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
- [x] Only one issue is addressed in this pull request
- [x] Labels from the issue that this PR is fixing are added to this pull request
- [x] No unnecessary issues are included into this pull request.
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3026
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit 14c74990f98fcd84641cc0a6831d745f446d03b7 with merge base 19fffdae0898df300f84dce911099bd589722336 ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
There is still some issue on line 54 that prevents the tutorial from building. Can you check?
Hi @svekars, how can I merge the PR?
@xmfan can you add a customcarditem and an entry in a toctree in the index.rst
Hi, I am concerned about CompiledAutograd's progress. After a series of DDP-related optimization:
- [DDP] Use compiled_autograd to trace DDP backward allreduce #110662
- [DDP][PT2D] Allreduce fusion fx pass using concat and all_reduce_coalesced #113209
Does the model have compute-computation overlap when enabled CompiledAutograd? Is there any latest progress related to DDP for CompiledAutograd? Now can the multi-card model with CompiledAutograd perform better compared with torch.compile using DDPOptimizer? Thanks!
@yitingw1 When enable CompiledAutograd, we should also enable the new CompiledDDP. Right now it is not automatically enabled. As for the overlapping, the answer is yes if the new CompiledDDP is enabled. The overlapping strategy is the same as the DDPOptimizer and the eager DDP. The difference from the DDPOptimizer is that the new CompiledDDP, with the help from CompiledAutograd, should produce less if not zero graph breaks. However, whether it will perform better or not depending on the models.
@svekars can we do that in a separate PR? we don't even have a compile section right now. and I'd like to add in the torch.compile tutorial as well as the TORCH_LOGS one
New description looks good to me!