sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

lang attribute set to source language for untranslated text

Open thibaudcolas opened this issue 6 months ago • 2 comments

Is your feature request related to a problem? Please describe.

When working with mixed-language web content, HTML elements should use the lang attribute to define their language, if that language differs from the lang attribute of the <html> page. Without this present, documentation produced with Sphinx fails WCAG SC 3.1.2 Language of Parts.

For example, I have documentation written in English, and I use Sphinx to build the HTML of an in-progress Finnish translation. I set the Sphinx config to translation_progress_classes = True, build the docs, and get:

<!DOCTYPE html>
<html lang="fi">
<!-- […] -->
<h1 class="translated">Tervetuloa ”Sphinx Wagtail teema” -dokumentaatioon!</h1>
<!-- ❌ Untranslated text should use lang="en" -->
<p class="untranslated">This is the Sphinx theme used for the official Wagtail docs.</p>

Describe the solution you'd like

All untranslated text should have a lang set to the source language. From the example above,

-<p class="untranslated">This is the Sphinx theme used for the official Wagtail docs.</p>
+<p lang="en" class="translated">This is the Sphinx theme used for the official Wagtail docs.</p>

Describe alternatives you've considered

A poor workaround would be to do this at the theme level, by using translation_progress_classes = True and JavaScript to add the lang attribute to the relevant elements.

Additional context

See Accessibility of multilingual content with mixed translation in the Python forum.

I have attempted to implement this myself alongside the AddTranslationClasses transform, it’s a simple node['lang'] = "en", however there are two issues. First docutils doesn’t seem to support setting the lang attribute, so we need to override starttag in HTML5Translator:

    def starttag(self, node: Element, tagname: str, *args: Any, **atts: Any) -> str:
        # Respect lang already decided by Sphinx (e.g., on <html>).
        if 'lang' not in atts:
            if lang := node.attributes.get('lang'):
                atts['lang'] = lang
        return super().starttag(node, tagname, *args, **atts)

Second, and more problematic, I can’t see a way to fetch the language of the source document. The existing language configuration option is for the target language. Adding a source_language configuration option that also defaults to en would solve this, but I’m not sure if there is a better way without introducing the extra option.

thibaudcolas avatar Aug 16 '25 11:08 thibaudcolas

Got to chat about this with @AA-Turner at PyCon UK – likely approach seems to be adding a source_language configuration option. With en being a good default for practical purposes (similar to how language defaults to it). Possible concerns with migration to this new configuration / possible breakage.

thibaudcolas avatar Sep 20 '25 15:09 thibaudcolas

Should we consider adding a new "language" attribute to all Doctuils Doctree elements?

gmilde avatar Dec 09 '25 12:12 gmilde