EDDI icon indicating copy to clipboard operation
EDDI copied to clipboard

Add support for additional TTS integrations through non-Microsoft focused SpeechService interface

Open druggedhippo opened this issue 3 years ago • 3 comments

EDDI currently uses whatever built in Windows TTS system is installed. Unfortunately, the built in Windows TTS are not particularly good.

This feature request is to ask for a better more modular SpeechService class that allows other speech engines to "plugin" that do not rely on the Windows TTS interfaces and provide the same WAV stream as the existing class uses.

Examples of other engines could include (but are not limited to):

  • Amazon Polly - https://ai-service-demos.go-aws.com/polly
  • Google - https://cloud.google.com/text-to-speech
  • Microsoft Azure - https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/
  • Different versions of the SAPI interface

As a proof of concept, here is an Amazon polly implementation I created.

https://gist.github.com/druggedhippo/0a887973ee019dea1fc9e522f513b0f5

Example audio of Amazon Polly processing a EDDI TTS prompt in real-time:

https://imgur.com/zyoWmQg

druggedhippo avatar Aug 19 '22 15:08 druggedhippo

Thank you for this. 😀

As you have effectively demonstrated, it is indeed possible to add additional speech synthesizers to EDDI, including for voices sourced from various cloud development environments (Azure, AWS, etc.).

These cloud voices typically require the user to provide specific credentials and are limited in some way (either as timed trials or offering to render a limited number of words for free each month).

We're happy to support additional voices in EDDI but it is also important to note that voices from different sources do not always behave alike (in terms of SSML support, lexicons, etc).

We would need to do some additional work to document the new capability and help users enter their credentials for accessing the voice. Some UI changes to allow capturing credentials in EDDI would probably also be very welcome.

Tkael avatar Oct 03 '22 18:10 Tkael

Related: https://github.com/jamescl604/MSCognitiveSpeechForVoiceAttack

Tkael avatar Nov 18 '22 02:11 Tkael

https://cloud.google.com/text-to-speech/docs/libraries

Tkael avatar May 15 '23 00:05 Tkael