Inconsistency Issue with Text-to-Speech Model Output
Problem:
The text-to-speech (TTS) model is exhibiting inconsistency in its output. Every time the model is invoked, it generates different responses, which hinders the reliability and predictability of the system.
Query:
Is there a method or approach to ensure repeatability in the responses generated by the TTS model?
Additional Context:
The inconsistency issue with the TTS model output has been tested using the https://ttsdemo.themetavoice.xyz/.
Testing was conducted with the following settings:
Speech Stability: 10 Speaker Similarity: 5 Preset Voices: Bria
Yes, you can "seed" the synthesis, but we don't provide this functionality on ttsdemo.themetavoice.xyz at the moment.
Despite hosting the Gradio UI with Docker and configuring the settings with seeds 0, 42, and 100, the issue of inconsistent output persists. Additionally, the output quality is also deteriorating.
Could you please advise which seed value will provide consistent responses?
Try loading the latents of your best samples with fixed seed and settings