TTS [Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune

[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune

Open bensonbs opened this issue 8 months ago • 4 comments

Describe the bug

When generating higher-pitched female voices after fine-tuning the xtts-v2 model, there is a noticeable hoarseness, resembling the strain one might experience when trying to reach high musical notes.

abnormal example: https://mork.ro/NQjFi

normal example: https://mork.ro/3iZ8Q#

Two voices generated from the same model, using different audio prompts.

To Reproduce

infer

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.13",
        "version": "#202310061235~1697396945~22.04~9283e32 SMP PREEMPT_DYNAMIC Sun O"
    }
}

Additional context

No response

Jun 02 '24 17:06 bensonbs

TTS TTS copied to clipboard

[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

TTS
TTS copied to clipboard