sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

Fix linkcheck anchor encoding issue

Open nrdlngr opened this issue 5 months ago • 1 comments

Fix linkcheck anchor encoding issue (#13620)

Description

This PR fixes an issue where the linkcheck builder incorrectly reports "Anchor not found" errors for URLs with encoded characters in fragment identifiers (anchors), despite these URLs working correctly in web browsers.

Current Behavior

When encountering a URL with percent-encoded characters in the anchor/fragment (e.g., https://example.com/page#standard-input%2Foutput-stdio), the linkcheck builder:

  1. Extracts the fragment: standard-input%2Foutput-stdio
  2. Decodes it to: standard-input/output-stdio
  3. Searches for an HTML element with id="standard-input/output-stdio" or name="standard-input/output-stdio"
  4. Reports a broken link when the element isn't found, even though the URL works in browsers

Changes Made

  • Enhanced AnchorCheckParser to check for multiple variants of the anchor:
    • The decoded version (current behavior)
    • The original encoded version
    • A re-encoded version if the decoded version contains encoding-required characters
  • Added comprehensive tests to verify the new behavior
  • Updated the contains_anchor function to accept both decoded and original encoded anchors
  • Added entry to CHANGES.rst

Testing Done

  • Added unit tests for the AnchorCheckParser class
  • Added integration tests with a mock HTTP server that serves HTML with encoded anchors
  • Verified that all tests pass with the new implementation

Fixes

Fixes #13620

nrdlngr avatar Jun 06 '25 02:06 nrdlngr

After some initial confusion (documented to some extent in the linked issue thread #13620), I'm now supportive of this functionality, and would like to see this merged. I would like a few adjustments/refactorings to be made before then, though.

jayaddison avatar Jun 13 '25 10:06 jayaddison