Create an API accessible through the scripting interface to generate and play text to speech
QT seems to support TTS in their ecosystem. https://doc.qt.io/qt-5/qtspeech-index.html
It does, but it seems extremely limited.
It seems to target an extremely minimal usage scenario probably for the sake of accessibility. Better than nothing of course, but I think for our purposes it's going to be too limited.
Understood. I did look just briefly into the QT provided TTS, and there was not a lot of information about it. It is likely we will have to find a more purpose built package for generating TTS audio. I've updated the issue title accordingly.
We actually already have this. It's called TextToSpeech, but it's completely undocumented and only works on Windows. On Linux we could dynamically link at runtime to the system's libspeechd (lots of both free and proprietary software do this, we can't hard-link it in because it's LGPLv2.1-or-later)
interface/src/scripting/TTSScriptingInterface.cpp
In interface/resources/qml/hifi/tts/TTS.qml, it looks like there even used to be a tool somewhere for people to speak through their avatar using TTS.
Maybe https://github.com/espeak-ng/espeak-ng would be a good way to have a cross platform library for TTS?
espeak-ng is GPLv3 so we can't directly link to it. libspeechd is basically an LGPL shim to other possibly-GPL'ed or proprietary speech providers like eSpeak to make them safe to use in terms of licensing.
I've just checked the QTextToSpeech page, and they also use the native engines on Windows/macOS/Android, and libspeechd on Linux.
The existing old TextToSpeech API we have I think might have been meant to also output PCM through the avatar? Both QTextToSpeech and libspeechd don't seem to support that, so it would only be for received chat messages or the UI.