Mantella icon indicating copy to clipboard operation
Mantella copied to clipboard

Allow emotion variations of XTTS latents

Open art-from-the-machine opened this issue 10 months ago • 1 comments

XTTS can struggle with variations in emotion in text prompts. However, if trained solely on wav files of a certain emotion, XTTS can carry that emotion across text prompts.

A workaround for XTTS's limitation in varied emotions would be to create a separate latent file for each emotion of a given voice model, and call the required model based on the emotion of the sentence. This emotion can be decided by the LLM via actions (similar to Offended and Follow).

For example, if a "neutral" latent is simply femalenord.json, a "happy" latent could be called via a search for femalenord_happy.json.

art-from-the-machine avatar Apr 11 '24 18:04 art-from-the-machine