HTML5 permalinks are not permanent if section header starts with number
In pip Changelog slugs in html anchors are not permanently pointed to corresponding version. Instead, they are incremental position numbers, which start with #id1, so when new version of pip is released all anchors shift and start to point to a different version.
To Reproduce
#!/bin/bash
DOCDIR=testnumslug
rm -rf $DOCDIR
mkdir $DOCDIR
cat <<EOF > $DOCDIR/index.rst
Hi
==
1.2.0
-----
1.1.0
-----
1.0.0
-----
EOF
# Application error:
# config directory doesn't contain a conf.py file (testnumslug)
touch $DOCDIR/conf.py
sphinx-build $DOCDIR $DOCDIR/_html
echo -e "\n-----\n"
grep -R 'Permalink' $DOCDIR/_html/index.html
This gives the output.
<h1>Hi<a class="headerlink" href="#hi" title="Permalink to this headline">¶</a></h1>
<h2>1.2.0<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h2>
<h2>1.1.0<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h2>
<h2>1.0.0<a class="headerlink" href="#id3" title="Permalink to this headline">¶</a></h2>
Expected behavior
<h1>Hi<a class="headerlink" href="#hi" title="Permalink to this headline">¶</a></h1>
<h2>1.2.0<a class="headerlink" href="#1.2.0" title="Permalink to this headline">¶</a></h2>
<h2>1.1.0<a class="headerlink" href="#1.1.0" title="Permalink to this headline">¶</a></h2>
<h2>1.0.0<a class="headerlink" href="#1.0.0" title="Permalink to this headline">¶</a></h2>
Environment info
- Python version: 3.9.1
- Sphinx version: 3.2.1
Additional context
- https://github.com/pypa/pip/issues/8152
The reason to the current behaviour is likely due to the HTML4 spec:
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
So id="1.2.0" is technically invalid (although I suspect many browsers would handle it fine, since HTML5 loosened the restriction).
The behaviour still feels quite unintuitive to me, however. I would expect Sphinx to generate something more stable, such as id="id-1_2_0" instead.
With that said, you can always specify an explicit reference yourself:
Hi
==
.. _v1_2_0:
1.2.0
-----
.. _v1_1_0:
1.1.0
-----
.. _v1_0_0:
1.0.0
-----
This would always work regardless of the section title.
Sphinx generates HTML5 by default since 2.0 https://github.com/sphinx-doc/sphinx/issues/4587
It comes from the node ID generation rule of docutils; the core library of Sphinx. It was defined to support many kinds of formats. https://repo.or.cz/docutils.git/blob/HEAD:/docutils/docutils/nodes.py#l2220
What is the role of this function then?
https://github.com/sphinx-doc/sphinx/blob/3ed7590ed411bd93b26098faab4f23619cdb2267/sphinx/util/nodes.py#L435-L439
It's a local ID generator for Sphinx domains. It does not relate to the section IDs.
It looks like an override by import path. Although it doesn't make it any better.
In [2]: from docutils.nodes import make_id
In [3]: make_id('1.2.0')
Out[3]: ''
In [7]: from sphinx.util.nodes import _make_id
In [8]: _make_id('1.2.0')
Out[8]: ''
Is it possible to define a function html5_id(string: str) and delegate HTML id generation to it?
The same behavior also goes with non-ASCII headers, producing idX. If a header consists of both ASCII and non-ASCII characters, all non-ASCII parts will be removed.
@madjxatw understood. HTML5 removes all restrictions from IDs, which makes even these valid.
<p id="#">Foo.
<p id="##">Bar.
<p id="♥">Baz.
<p id="©">Inga.
<p id="{}">Lorem.
<p id="“‘’”">Ipsum.
<p id="⌘⌥">Dolor.
<p id="{}">Sit.
<p id="[attr=value]">Amet.
<p id="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.">Hello world!
https://mathiasbynens.be/notes/html5-id-class
@abitrolly, exactly, so Sphinx needs to implement a HTML5 version of make_id() to keep consistency with its default HTML5 output.
@madjxatw not Sphinx, some human needs to sit down and write the code. While the code seems trivial, right now it is unclear where to place the code.
@abitrolly, hopefully some unicode slugifier (e.g. https://github.com/mozilla/unicode-slugify) could be used as an extension or be integrated somehow into Sphinx.
Sphinx has still supported HTML4 output. The HTML Help builder also depends on HTML4. In addition to this, I can't say the change does not affect other builders. Sphinx is not only for building HTML5.
@tk0miya, is it possible to have an option that lets users decide whether to enable unicode permalink?
I can't promise the option works fine for "all" builders. If I added it to Sphinx, I'll describe it as "it might work. But not promised. Please don't report us even if something broken" :-p I think I can't provide such an option from the official. Please hack your own risk.
That's all right, it wouldn't be a big problem to hack it by ourselves , however it still sounds a bit sorry that unicode IDs is not officially supported, especially for those non-English (e.g. East Asian) writers who really need IDs with their own language characters. :-(
I don't know about the details about how Sphinx works internally, but couldn't a custom unicode ID maker be invoked only when the HTML5 builder is in use?
I don't know about the details about how Sphinx works internally, but couldn't a custom unicode ID maker be invoked only when the HTML5 builder is in use?
It's diffucult. The node IDs are generated in the reading phase. The result of the phase is cached and used for incremental builds. It means introducing the new ID breaks the incremental build feature.
It's diffucult. The node IDs are generated in the reading phase. The result of the phase is cached and used for incremental builds. It means introducing the new ID breaks the incremental build feature.
But HTML IDs should not be node IDs. Is it possible during initial read to generate IDs in a structure that allows to output a proper HTML5 slug on writing? Like if ID is autogenerated from the title, store the title.
But HTML IDs should not be node IDs. Is it possible during initial read to generate IDs in a structure that allows to output a proper HTML5 slug on writing? Like if ID is autogenerated from the title, store the title.
Of course, it's possible if you give a wonderful patch! (IMO, it's impossible to me as I commented "it's difficult" above).
@tk0miya what do you mean by "if's difficult"? It could help if you can point to locations where Sphinx reads and caches node ID, and where to insert write_html5_id` call.
The cross-referencing system of Sphinx has been based on the node IDs. So I can't imagine how we replace it by unicode IDs. I guess we need to rewrite whole of docutils and Sphinx. So I can't tell you where to do that.
@tk0miya the idea is not about replacing internal node IDs. It is about writing IDs in HTML5 format on output to HTML5. All IDs written this way will be consistent.
I don't know how to do that. But all contributions are welcome!
I am afraid I can go on only with funded contributions. Learning a codebase like this in my free time is not sustainable. A pity that this seemingly simple generator turned out to be that complex on the inside.
It comes from the node ID generation rule of docutils; the core library of Sphinx. It was defined to support many kinds of formats.
The rationale and details of this design decision are explained in https://docutils.sourceforge.io/docs/ref/rst/directives.html#identifier-normalization There is an open feature request for less restrictive IDs For the original question: setting an id-prefix will keep permanent IDs on section headings starting with a number since Docutils 0.18, so this can provide a workaround in future Sphinx versions.
Still stumbling over this in 2025. Especially bothersome with changelogs.
Just as a demonstration: the latest release of black will always be
https://black.readthedocs.io/en/stable/change_log.html#id1
there is no way to hard link to older versions.
This should not be an issue anymore. Especially since it is valid HTML5.
There should at least be a way to overwrite this behavior.
The workaround with an id-prefix can be used with current Docutils and Sphinx:
Example
In docutils.conf, set:
[parsers]
id_prefix: black:
Then, the "self-link" for the heading 25.1.0 becomes <a class="headerlink" href="#black:25-1-0" title="Link to this heading">¶</a>.
Side-effect: all ids in the project are prefixed with black: (reStructuredText "reference names" are not affected).
A simpler workaround would be to use headings starting with an ASCII letter (release 25.1.0 or similar) or to provide explicit targets like
.. _release 25.1.0:
25.1.0
=====
A configurable ID generation is still on the TODO list. Doing this right will break a lot of existing behaviour, so this needs to be done very carefully.
A configurable ID generation is still on the TODO list. Doing this right will break a lot of existing behaviour, so this need to be done very carefully.
Fixing permalinks for numbered headers will be sufficient to close this issue.
I've looked into this, and this may be a bug in both docutils and sphinx. I'm not entirely sure if sphinx.util.nodes._make_id overrides every instance of the fragment/anchor definition, but I can see this behaviour is present in docutils as well as sphinx.
https://github.com/sphinx-doc/sphinx/blob/9eb3d7940fa587f5706268f19a1d6a977ff24fad/sphinx/util/nodes.py#L538-L562
https://github.com/docutils/docutils/blob/a5b983b73263445510d032845a70082eec7e2ca9/docutils/docutils/nodes.py#L2942-L2987 (not sure why this isn't embedding)
Changing the end of this definition in docutils to use the following seems to work
# shrink runs of whitespace and replace by hyphen
id = _non_id_chars.sub('-', ' '.join(id.split()))
id = id.rstrip('-+')
if not (clean_id := id.lstrip('0123456789-')):
clean_id = "id-" + id
return str(clean_id)
This could probably be optimized further, but it converts _non_id_at_ends: re.Pattern[str] = re.compile('^[-0-9]+|-+$') to be applied without regex, and continues to trim the end of the ID. If the ID is zeroed from the regex, then we take the original ID and apply id- to the front of it.
This may break existing fragments on existing projects, so proper thought and consideration is necessary for this bugfix.
Alternative implementation
If the id starts with a disallowed character, prepend id- to it, after using lstrip('-')
This would keep the entire string, where existing names such as 3 ways to contribute now becomes id-3-ways-to-contribute rather than ways-to-contribute (which might be a downside in this particular example.
# shrink runs of whitespace and replace by hyphen
id = _non_id_chars.sub('-', ' '.join(id.split()))
id = id.lstrip('-').rstrip('-+')
if _non_id_at_ends.match(id):
id = "id-" + id.lstrip()
return str(id)
_non_id_chars: re.Pattern[str] = re.compile('[^a-z0-9]+')
_non_id_at_ends: re.Pattern[str] = re.compile('^[0-9]')
I'm hoping to fix this as I was interesting in fixing this on the likes of black, pytest-cov, and more, but would love to fix it upstream so the entire ecosystem can benefit from this.
I've done some checking of the ecosystem to see what's affected on other projects, as this would impact a swath of notable projects.
Known affected projects (incomplete)
requests: https://requests.readthedocs.io/en/latest/community/updates/#id2 urllib3: https://urllib3.readthedocs.io/en/stable/changelog.html#id1 pypa: https://www.pypa.io/en/latest/history/#id1 flake8: https://flake8.pycqa.org/en/latest/release-notes/7.3.0.html#id1 PyNaCl: https://pynacl.readthedocs.io/en/latest/changelog/#id1 django: https://docs.djangoproject.com/en/5.2/releases/#id1 black: https://black.readthedocs.io/en/stable/change_log.html#id1 cython: https://cython.readthedocs.io/en/latest/src/changes.html#id1
Existing projects with workarounds
sqlalchemy: https://docs.sqlalchemy.org/en/20/changelog/changelog_20.html#change-2.0.42 alembic https://alembic.sqlalchemy.org/en/latest/changelog.html#change-1.17.2 cryptography*: https://cryptography.io/en/latest/changelog/#v46-0-3 (seems to have a workaround but would otherwise be impacted) furo*: https://pradyunsg.me/furo/changelog/#energetic-eminence (less of an impact because each version has a code name) pip*: https://pip.pypa.io/en/stable/news/#v25-3 (patched with a local plugin)
Potential negative impacts
https://bugzilla.readthedocs.io/en/latest/using/creating-an-account.html#creating-an-account (at least one example but more exist)
Source of these projects: https://www.sphinx-doc.org/en/master/examples.html, combined with some moderately random selection, and manual checking of changelog pages.