pytorch [doc] Add documentation for division by zero behavior in autograd

Fixes #128796

This PR adds documentation about the behavior of division by zero operations in PyTorch's autograd system. The documentation explains:

How division by zero produces inf values following IEEE-754 floating point arithmetic
How autograd handles these cases and why masking after division can lead to nan gradients
Provides concrete examples showing the issue
Recommends two solutions:
- Masking before division
- Using MaskedTensor (experimental API)

The documentation is added to the autograd notes section, making it easily discoverable for users who encounter this common issue.

This addresses the original issue #128796 which requested better documentation of this behavior to help users avoid common pitfalls when dealing with division by zero in their models.

dditional changes:

Fixed formatting consistency by replacing curly apostrophes with straight apostrophes in the existing documentation

cc @svekars @sekyondaMeta @AlannaBurke @ezyang @albanD @gqchen @nikitaved @soulitzer @Varal7 @xmfan

Jun 14 '25 09:06 Juliandlb

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/155987

:page_facing_up: Preview Python docs built from this PR
:page_facing_up: Preview C++ docs built from this PR
:question: Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 6dd35a9f991e04efbeca144f01c2e4f80e969561 with merge base 670dab6c630552b32189911f22896ec453e55ab7 (): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jun 14 '25 09:06 pytorch-bot[bot]

Didn't find following labels among repository labels: release notes: documentation

Jun 14 '25 09:06 pytorch-bot[bot]

@pytorchbot label "topic: not user facing"

Jun 14 '25 09:06 Juliandlb

All required labels are set and checks are passing. CI is now waiting for maintainer approval.
Let me know if anything else is needed!
cc @svekars @sekyondaMeta @AlannaBurke

Jun 14 '25 09:06 Juliandlb

A bit related:

https://github.com/pytorch/pytorch/issues/50122

Jun 14 '25 09:06 vadimkantorov

@pytorchbot merge

Jun 16 '25 16:06 soulitzer

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status here

Jun 16 '25 16:06 pytorchmergebot

Maybe same mitigations / workarounds here would help for the notorious torch.where usecase:

https://github.com/pytorch/pytorch/issues/156212

I think people also did custom autograd function or hooks to patch up the gradient or output...

I think at least docs should provide some copy-pasteable workaround

Jun 17 '25 21:06 vadimkantorov

pytorch pytorch copied to clipboard

[doc] Add documentation for division by zero behavior in autograd

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/155987

:white_check_mark: No Failures

Merge started

pytorch
pytorch copied to clipboard