Mantella
Mantella copied to clipboard
Allow emotion variations of XTTS latents
XTTS can struggle with variations in emotion in text prompts. However, if trained solely on wav files of a certain emotion, XTTS can carry that emotion across text prompts.
A workaround for XTTS's limitation in varied emotions would be to create a separate latent file for each emotion of a given voice model, and call the required model based on the emotion of the sentence. This emotion can be decided by the LLM via actions (similar to Offended and Follow).
For example, if a "neutral" latent is simply femalenord.json, a "happy" latent could be called via a search for femalenord_happy.json.