metavoice-src icon indicating copy to clipboard operation
metavoice-src copied to clipboard

Inconsistency Issue with Text-to-Speech Model Output

Open Prajval108 opened this issue 1 year ago • 4 comments

Problem:

The text-to-speech (TTS) model is exhibiting inconsistency in its output. Every time the model is invoked, it generates different responses, which hinders the reliability and predictability of the system.

Query:

Is there a method or approach to ensure repeatability in the responses generated by the TTS model?

Additional Context:

The inconsistency issue with the TTS model output has been tested using the https://ttsdemo.themetavoice.xyz/.

Testing was conducted with the following settings:

Speech Stability: 10 Speaker Similarity: 5 Preset Voices: Bria

speech-1 : speech_1 speech-2 : speech_2

Prajval108 avatar Apr 03 '24 15:04 Prajval108

Yes, you can "seed" the synthesis, but we don't provide this functionality on ttsdemo.themetavoice.xyz at the moment.

vatsalaggarwal avatar Apr 03 '24 15:04 vatsalaggarwal

Despite hosting the Gradio UI with Docker and configuring the settings with seeds 0, 42, and 100, the issue of inconsistent output persists. Additionally, the output quality is also deteriorating.

Could you please advise which seed value will provide consistent responses?

Prajval108 avatar Apr 04 '24 06:04 Prajval108

Try loading the latents of your best samples with fixed seed and settings

G-force78 avatar Apr 06 '24 09:04 G-force78