metavoice-src Inconsistency Issue with Text-to-Speech Model Output

Problem:

The text-to-speech (TTS) model is exhibiting inconsistency in its output. Every time the model is invoked, it generates different responses, which hinders the reliability and predictability of the system.

Query:

Is there a method or approach to ensure repeatability in the responses generated by the TTS model?

Additional Context:

The inconsistency issue with the TTS model output has been tested using the https://ttsdemo.themetavoice.xyz/.

Testing was conducted with the following settings:

Speech Stability: 10 Speaker Similarity: 5 Preset Voices: Bria

speech-1 : speech_1 speech-2 : speech_2

Apr 03 '24 15:04 Prajval108

Yes, you can "seed" the synthesis, but we don't provide this functionality on ttsdemo.themetavoice.xyz at the moment.

Apr 03 '24 15:04 vatsalaggarwal

Despite hosting the Gradio UI with Docker and configuring the settings with seeds 0, 42, and 100, the issue of inconsistent output persists. Additionally, the output quality is also deteriorating.

Could you please advise which seed value will provide consistent responses?

Apr 04 '24 06:04 Prajval108

Try loading the latents of your best samples with fixed seed and settings

Apr 06 '24 09:04 G-force78