lang attribute set to source language for untranslated text
Is your feature request related to a problem? Please describe.
When working with mixed-language web content, HTML elements should use the lang attribute to define their language, if that language differs from the lang attribute of the <html> page. Without this present, documentation produced with Sphinx fails WCAG SC 3.1.2 Language of Parts.
For example, I have documentation written in English, and I use Sphinx to build the HTML of an in-progress Finnish translation. I set the Sphinx config to translation_progress_classes = True, build the docs, and get:
<!DOCTYPE html>
<html lang="fi">
<!-- […] -->
<h1 class="translated">Tervetuloa ”Sphinx Wagtail teema” -dokumentaatioon!</h1>
<!-- ❌ Untranslated text should use lang="en" -->
<p class="untranslated">This is the Sphinx theme used for the official Wagtail docs.</p>
Describe the solution you'd like
All untranslated text should have a lang set to the source language. From the example above,
-<p class="untranslated">This is the Sphinx theme used for the official Wagtail docs.</p>
+<p lang="en" class="translated">This is the Sphinx theme used for the official Wagtail docs.</p>
Describe alternatives you've considered
A poor workaround would be to do this at the theme level, by using translation_progress_classes = True and JavaScript to add the lang attribute to the relevant elements.
Additional context
See Accessibility of multilingual content with mixed translation in the Python forum.
I have attempted to implement this myself alongside the AddTranslationClasses transform, it’s a simple node['lang'] = "en", however there are two issues. First docutils doesn’t seem to support setting the lang attribute, so we need to override starttag in HTML5Translator:
def starttag(self, node: Element, tagname: str, *args: Any, **atts: Any) -> str:
# Respect lang already decided by Sphinx (e.g., on <html>).
if 'lang' not in atts:
if lang := node.attributes.get('lang'):
atts['lang'] = lang
return super().starttag(node, tagname, *args, **atts)
Second, and more problematic, I can’t see a way to fetch the language of the source document. The existing language configuration option is for the target language. Adding a source_language configuration option that also defaults to en would solve this, but I’m not sure if there is a better way without introducing the extra option.
Got to chat about this with @AA-Turner at PyCon UK – likely approach seems to be adding a source_language configuration option. With en being a good default for practical purposes (similar to how language defaults to it). Possible concerns with migration to this new configuration / possible breakage.
Should we consider adding a new "language" attribute to all Doctuils Doctree elements?