pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

[doc] Add documentation for division by zero behavior in autograd

Open Juliandlb opened this issue 5 months ago • 5 comments

Fixes #128796

This PR adds documentation about the behavior of division by zero operations in PyTorch's autograd system. The documentation explains:

  1. How division by zero produces inf values following IEEE-754 floating point arithmetic
  2. How autograd handles these cases and why masking after division can lead to nan gradients
  3. Provides concrete examples showing the issue
  4. Recommends two solutions:
    • Masking before division
    • Using MaskedTensor (experimental API)

The documentation is added to the autograd notes section, making it easily discoverable for users who encounter this common issue.

This addresses the original issue #128796 which requested better documentation of this behavior to help users avoid common pitfalls when dealing with division by zero in their models.

dditional changes:

  • Fixed formatting consistency by replacing curly apostrophes with straight apostrophes in the existing documentation

cc @svekars @sekyondaMeta @AlannaBurke @ezyang @albanD @gqchen @nikitaved @soulitzer @Varal7 @xmfan

Juliandlb avatar Jun 14 '25 09:06 Juliandlb

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/155987

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 6dd35a9f991e04efbeca144f01c2e4f80e969561 with merge base 670dab6c630552b32189911f22896ec453e55ab7 (image): :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot[bot] avatar Jun 14 '25 09:06 pytorch-bot[bot]

Didn't find following labels among repository labels: release notes: documentation

pytorch-bot[bot] avatar Jun 14 '25 09:06 pytorch-bot[bot]

@pytorchbot label "topic: not user facing"

Juliandlb avatar Jun 14 '25 09:06 Juliandlb

All required labels are set and checks are passing. CI is now waiting for maintainer approval.
Let me know if anything else is needed!
cc @svekars @sekyondaMeta @AlannaBurke

Juliandlb avatar Jun 14 '25 09:06 Juliandlb

A bit related:

  • https://github.com/pytorch/pytorch/issues/50122

vadimkantorov avatar Jun 14 '25 09:06 vadimkantorov

@pytorchbot merge

soulitzer avatar Jun 16 '25 16:06 soulitzer

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging Check the merge workflow status here

pytorchmergebot avatar Jun 16 '25 16:06 pytorchmergebot

Maybe same mitigations / workarounds here would help for the notorious torch.where usecase:

  • https://github.com/pytorch/pytorch/issues/156212

I think people also did custom autograd function or hooks to patch up the gradient or output...

I think at least docs should provide some copy-pasteable workaround

vadimkantorov avatar Jun 17 '25 21:06 vadimkantorov