NaturalVoiceSAPIAdapter Request for Higher Audio Quality Configuration Support

On Windows 11, when using Narrator with 24kHz audio output, we’ve observed occasional distortion or popping artifacts in specific scenarios. To improve playback quality, would it be possible to introduce a configuration option that enables Narrator to output audio at 48kHz?

This enhancement would help ensure clearer and more stable audio performance, especially in environments sensitive to sound fidelity.

Aug 07 '25 09:08 LeXwDeX

If you are using offline natural voices, then unfortunately the only supported format is 24 kHz. This is a limitation documented in embedded speech.

If you are using online voices, then it might be possible to use a higher-quality format. I chose the 24kHz 48kbit/s mono MP3 format just because it's convenient and supported by Edge and Azure voices. Edge voices have more limitations, so not all formats are supported.

I wonder, how much quality improvement will it be from 24kHz to 48kHz? The current 24kHz format sounds decent to me. Could you give me some examples of "occasional distortion or popping artifacts" that are easy to reproduce?

Aug 07 '25 11:08 gexgd0419

If you are using offline natural voices, then unfortunately the only supported format is 24 kHz. This is a limitation documented in embedded speech.

If you are using online voices, then it might be possible to use a higher-quality format. I chose the 24kHz 48kbit/s mono MP3 format just because it's convenient and supported by Edge and Azure voices. Edge voices have more limitations, so not all formats are supported.

I wonder, how much quality improvement will it be from 24kHz to 48kHz? The current 24kHz format sounds decent to me. Could you give me some examples of "occasional distortion or popping artifacts" that are easy to reproduce?

The difference in audio quality is noticeable even on regular headphones. I checked the code and found that with some minor adjustments to the decoder and request parameters, support for 48kHz output is possible (though I’m not sure if this will actually be utilized by Windows Narrator). I also compiled my own install.exe and x64 DLL to register for testing.

My main use case is enabling TTS in the game World of Warcraft, where I’ve noticed some strange popping noises. Subjectively, after making the modifications mentioned above, the popping sounds have been reduced.

Aug 08 '25 01:08 LeXwDeX

Are you using online voices?

Could you tell me the voice and the new parameters you are using, so that I can experiment and implement it myself?

Aug 08 '25 01:08 gexgd0419

Are you using online voices?

Could you tell me the voice and the new parameters you are using, so that I can experiment and implement it myself?

Azure Online TTS Server:

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to-speech?tabs=streaming#audio-outputs

NaturalVoiceSAPIAdapter\SpeechRestAPI.cpp

// Send configuration and wait for audio data response
void SpeechRestAPI::SendRequest(const WSConnectionPtr& conn)
{
	m_allDataReceived = false;

	nlohmann::json json = {
		{"context", {
			{"synthesis", {
				{"audio", {
					{"metadataOptions", {
						{"bookmarkEnabled", (bool)BookmarkCallback},
						{"punctuationBoundaryEnabled", (bool)PunctuationBoundaryCallback},
						{"sentenceBoundaryEnabled", (bool)SentenceBoundaryCallback},
						{"wordBoundaryEnabled", (bool)WordBoundaryCallback},
						{"visemeEnabled", (bool)VisemeCallback},
					}},
					{"outputFormat", "audio-48khz-192kbitrate-mono-mp3"}
				}},
				{"language", {
					{"autoDetection", false}
				}}
			}}
		}}
	};

NaturalVoiceSAPIAdapter\TTSEngine.cpp

STDMETHODIMP CTTSEngine::GetOutputFormat(const GUID* /*pTargetFormatId*/, const WAVEFORMATEX* /*pTargetWaveFormatEx*/,
    GUID* pDesiredFormatId, WAVEFORMATEX** ppCoMemDesiredWaveFormatEx) noexcept
{
    // Azure 48kHz
    return SpConvertStreamFormatEnum(SPSF_48kHz16BitMono, pDesiredFormatId, ppCoMemDesiredWaveFormatEx);
}

Aug 08 '25 06:08 LeXwDeX

Since you are using this in World of Warcraft, is #37 a problem for you? Someone reported that World of Warcraft 11.2 is not working with the latest version of this engine.

Aug 08 '25 09:08 gexgd0419

Since you are using this in World of Warcraft, is #37 a problem for you? Someone reported that World of Warcraft 11.2 is not working with the latest version of this engine.

I used the LUA API in World of Warcraft and didn’t encounter this issue.

I came across this popping sound issue quite a while ago. I think I’ll try switching the audio from a streaming RESTful API to a one-time full file request to test it.

Aug 09 '25 06:08 LeXwDeX