Request for Higher Audio Quality Configuration Support
On Windows 11, when using Narrator with 24kHz audio output, we’ve observed occasional distortion or popping artifacts in specific scenarios. To improve playback quality, would it be possible to introduce a configuration option that enables Narrator to output audio at 48kHz?
This enhancement would help ensure clearer and more stable audio performance, especially in environments sensitive to sound fidelity.
If you are using offline natural voices, then unfortunately the only supported format is 24 kHz. This is a limitation documented in embedded speech.
If you are using online voices, then it might be possible to use a higher-quality format. I chose the 24kHz 48kbit/s mono MP3 format just because it's convenient and supported by Edge and Azure voices. Edge voices have more limitations, so not all formats are supported.
I wonder, how much quality improvement will it be from 24kHz to 48kHz? The current 24kHz format sounds decent to me. Could you give me some examples of "occasional distortion or popping artifacts" that are easy to reproduce?
If you are using offline natural voices, then unfortunately the only supported format is 24 kHz. This is a limitation documented in embedded speech.
If you are using online voices, then it might be possible to use a higher-quality format. I chose the 24kHz 48kbit/s mono MP3 format just because it's convenient and supported by Edge and Azure voices. Edge voices have more limitations, so not all formats are supported.
I wonder, how much quality improvement will it be from 24kHz to 48kHz? The current 24kHz format sounds decent to me. Could you give me some examples of "occasional distortion or popping artifacts" that are easy to reproduce?
The difference in audio quality is noticeable even on regular headphones. I checked the code and found that with some minor adjustments to the decoder and request parameters, support for 48kHz output is possible (though I’m not sure if this will actually be utilized by Windows Narrator). I also compiled my own install.exe and x64 DLL to register for testing.
My main use case is enabling TTS in the game World of Warcraft, where I’ve noticed some strange popping noises. Subjectively, after making the modifications mentioned above, the popping sounds have been reduced.
Are you using online voices?
Could you tell me the voice and the new parameters you are using, so that I can experiment and implement it myself?
Are you using online voices?
Could you tell me the voice and the new parameters you are using, so that I can experiment and implement it myself?
Azure Online TTS Server:
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to-speech?tabs=streaming#audio-outputs
NaturalVoiceSAPIAdapter\SpeechRestAPI.cpp
// Send configuration and wait for audio data response
void SpeechRestAPI::SendRequest(const WSConnectionPtr& conn)
{
m_allDataReceived = false;
nlohmann::json json = {
{"context", {
{"synthesis", {
{"audio", {
{"metadataOptions", {
{"bookmarkEnabled", (bool)BookmarkCallback},
{"punctuationBoundaryEnabled", (bool)PunctuationBoundaryCallback},
{"sentenceBoundaryEnabled", (bool)SentenceBoundaryCallback},
{"wordBoundaryEnabled", (bool)WordBoundaryCallback},
{"visemeEnabled", (bool)VisemeCallback},
}},
{"outputFormat", "audio-48khz-192kbitrate-mono-mp3"}
}},
{"language", {
{"autoDetection", false}
}}
}}
}}
};
NaturalVoiceSAPIAdapter\TTSEngine.cpp
STDMETHODIMP CTTSEngine::GetOutputFormat(const GUID* /*pTargetFormatId*/, const WAVEFORMATEX* /*pTargetWaveFormatEx*/,
GUID* pDesiredFormatId, WAVEFORMATEX** ppCoMemDesiredWaveFormatEx) noexcept
{
// Azure 48kHz
return SpConvertStreamFormatEnum(SPSF_48kHz16BitMono, pDesiredFormatId, ppCoMemDesiredWaveFormatEx);
}
Since you are using this in World of Warcraft, is #37 a problem for you? Someone reported that World of Warcraft 11.2 is not working with the latest version of this engine.
Since you are using this in World of Warcraft, is #37 a problem for you? Someone reported that World of Warcraft 11.2 is not working with the latest version of this engine.
I used the LUA API in World of Warcraft and didn’t encounter this issue.
I came across this popping sound issue quite a while ago. I think I’ll try switching the audio from a streaming RESTful API to a one-time full file request to test it.