Proposal: add new builder that checks if hardcoded URLs can be replaced with crossrefs
Subject: (copied from https://github.com/pytest-dev/pytest/pull/9082)
This PR adds a new no-op Sphinx builder named crossrefcheck that checks whether a hardcoded URL in the text can be replaced with a crossreference from one of the inventories configured in intersphinx_mapping.
Feature or Bugfix
- Feature
Purpose
If a hardcoded link can be replaced with a crossref, crossrefcheck will emit a warning like this:
pytest/doc/en/reference/reference.rst:127: WARNING: hardcoded link 'https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmostEqual' could be replaced by a cross-reference to 'python' inventory (try using ':py:meth:`unittest.TestCase.assertAlmostEqual`' instead)
If a replacement suggestion is printed, the hardcoded URL can be safely replaced with that suggestion.
Usage
$ sphinx-build -b crossrefcheck doc build
Caveats
The builder is not able to catch the URLs for the internal docs. This is because no information is known about the URL the docs will be hosted at. Thus, if e.g. links to https://docs.pytest.org should be checked as well in the docs for the pytest project, it is best to extend intersphinx_mapping. Example:
intersphinx_mapping = {
...
"pytest-latest": ("https://docs.pytest.org/en/latest", None),
"pytest-stable": ("https://docs.pytest.org/en/stable", None),
"pytest-6": ("https://docs.pytest.org/en/6.2.x", None),
}
If a URL is reported without a suggestion, it usually indicates that either:
- a link points to a section or paragraph that doesn't have a Sphinx reference defined - the external docs should be updated then,
- or the reference was changed, so the link is outdated (but still valid!). In this case, it's usually best to check the docs source to find the new correct ref name.
Example output from running over Sphinx repository
sphinx/doc/development/tutorials/helloworld.rst:132: WARNING: hardcoded link 'https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH' could be replaced by a cross-reference to 'python' inventory (try using ':std:envvar:`PYTHONPATH`' instead)
sphinx/doc/development/tutorials/helloworld.rst:151: WARNING: hardcoded link 'https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH' could be replaced by a cross-reference to 'python' inventory (try using ':std:envvar:`PYTHONPATH`' instead)
sphinx/doc/development/tutorials/todo.rst:294: WARNING: hardcoded link 'https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH' could be replaced by a cross-reference to 'python' inventory (try using ':std:envvar:`PYTHONPATH`' instead)
sphinx/doc/examples.rst:95: WARNING: hardcoded link 'https://docs.python.org/3/' could be replaced by a cross-reference to 'python' inventory (no suggestion)
sphinx/doc/man/sphinx-apidoc.rst:17: WARNING: hardcoded link 'https://docs.python.org/3/library/fnmatch.html' could be replaced by a cross-reference to 'python' inventory (try using ':std:doc:`library/fnmatch`' instead)
sphinx/doc/usage/configuration.rst:570: WARNING: hardcoded link 'https://requests.readthedocs.io/en/master/' could be replaced by a cross-reference to 'requests' inventory (no suggestion)
sphinx/doc/usage/restructuredtext/directives.rst:1000: WARNING: hardcoded link 'https://docs.python.org/3/reference/lexical_analysis.html#identifiers' could be replaced by a cross-reference to 'python' inventory (try using ':std:ref:`identifiers`' instead)
The original proposal was made in https://github.com/pytest-dev/pytest/pull/9082, where it was suggested to add the builder to Sphinx instead, so it can be reused in multiple projects. Please note that this PR isn't ready to be merged yet (e.g. missing tests and proper documentation). Its purpose is merely to propose the builder first and check whether @tk0miya would approve the general idea.
Agreed. This must be worthy. So +1 for merging this into the core. But I'm not sure this should be added as a standalone builder or an additional feature of linkcheck builder. Is there any reason to make this standalone?
This is pretty cool. Implementation-wise I'm wondering if it is possible to have a bit more indirection such that this doesn't access the intersphinx inventory directly, but asks the intersphinx extension to lookup an URL and return the necessary information. The issue is that in order to update intersphinx to work in all cases the inventory format needs to change so it would be good if the rest of Sphinx doesn't assume a specific inventory format.
I'm not sure this should be added as a standalone builder or an additional feature of linkcheck builder. Is there any reason to make this standalone?
I would prefer a standalone crossref builder so I can run it offline in CI, where linkcheck can fail due to flaky websites or network issues.
Since the changes are added to Sphinx itself, the builder might not be necessary at all, depending on how deeply those warnings should be integrated. E.g. I could add a SphinxPostTransform directly to sphinx.ext.intersphinx and emit the warnings by default when the docs are built and the intersphinx extension is enabled. @tk0miya @Zac-HD what do you think? I have added a prototype in the latest commit where the warnings are emitted directly on doc build and no special builder is needed. This also simplifies the changes a lot (one new SphinxPostTransform, one extra utility function). The move of the code to intersphinx also addresses @jakobandersen's comment, since all the logic is now internal to intersphinx extension.
That would be fantastic for all my use-cases!
follow-up idea - for linkcheck identify links that are in locations where a intersphinx ref could be used or added (for links to other docs that use sphinx but are not declared as intersphinx
@tk0miya gentle ping for a review.
@tk0miya gentle ping. Maybe it's still not too late for this to become part of 4.4?
@hoefling are you able to resolve conflicts please?
A
@AA-Turner done :sunglasses: