NaturalVoiceSAPIAdapter icon indicating copy to clipboard operation
NaturalVoiceSAPIAdapter copied to clipboard

pyttsx3: ValueError when parsing 'Language' attribute with value '409;9'

Open maximq opened this issue 2 months ago • 1 comments

Hello,

I'm encountering a ValueError when trying to get the voice language using the pyttsx3 library. The issue occurs in the sapi5.py file when the Language attribute is parsed.

The Problem: The attr.GetAttribute("Language") call returns a string '409;9'. This value is then passed to int(language_attr, 16), which fails because '409;9' is not a valid hexadecimal string (due to the semicolon).

Code from sapi5.py

language_attr = attr.GetAttribute("Language") # Returns '409;9'
language_code = int(language_attr, 16) # Throws ValueError: invalid literal for int() with base 16: '409;9'

My Questions:

Where does the value '409;9' come from? It seems to be a combination of a Language ID (e.g., 409 it's a language code) and another identifier.

Is it possible to handle this gracefully without rewriting the library? For example, by splitting the string on the semicolon and using only the first part ('409') for the hexadecimal conversion?

Environment:

OS: Win 11

Python Version: 3.10.11

pyttsx3 Version: 2.99

maximq avatar Oct 02 '25 13:10 maximq

SAPI 5 allows the "Language" attribute to have multiple values separated by semicolons, where each value is a hexadecimal language ID without 0x prefix.

When a voice supports more than one language, it can list all supported languages. It can also add a "neutral" language ID for fallback. In this case, "409" is "English (United States)", but it won't match other English dialects. Therefore, "9", region-neutral English, is also added.

This gives the voice more opportunity to be matched when the application uses a series of language IDs as the filtering criteria. When using SpEnumTokens or SpCreateBestObject with attribute criteria set to Language=409, SAPI 5 will know to match the language ID 409 with every item in the language list, so either Language=409 or Language=9 can match.

When you only need one and the main language of the voice, you can just take the first segment. That's how SpGetLanguageFromToken gets the language ID: extract the string before the first semicolon.

Built-in voices in the Windows XP era also have this kind of "Language" attribute, such as "409;9", so I decided to follow that. But now the built-in voices in Windows 10/11 only have a single language ID. It's possible that other third party voices can have multiple language IDs though, so I think pyttsx3 might need to address this issue.

gexgd0419 avatar Oct 02 '25 14:10 gexgd0419