TTS
TTS copied to clipboard
[Bug] Error while Fine-Tuning TTS for Japanese Language
Describe the bug
It seems that there is hidden issue behind the dataset preparation for fine-tuning TTS on Japanese Language
To Reproduce
- Clone the repo and install the pacakges.
> git clone --branch xtts_demo -q https://github.com/coqui-ai/TTS.git
> pip install --use-deprecated=legacy-resolver -q -e TTS
> pip install --use-deprecated=legacy-resolver -q -r TTS/TTS/demos/xtts_ft_demo/requirements.txt
> pip install -q typing_extensions==4.8 numpy==1.26.2
- Launch the Fine-Tuning GUI
> python TTS/TTS/demos/xtts_ft_demo/xtts_demo.py
-
Add few Japanese Speech Audio samples to the dataset processing and click
Create Dataset
-
Move to the fine-tuning tab and run the training
And the error
message pops up:
The training was interrupted due an error !! Please check the console to check the full error message! Error summary: Traceback (most recent call last): File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 284, in train_model config_path, original_xtts_checkpoint, vocab_file, exp_path, speaker_wav = train_gpt(language, num_epochs, batch_size, grad_acumm, train_csv, eval_csv, output_path=output_path, max_audio_length=max_audio_length) File "/content/TTS/TTS/demos/xtts_ft_demo/utils/gpt_train.py", line 138, in train_gpt train_samples, eval_samples = load_tts_samples( File "/content/TTS/TTS/tts/datasets/__init__.py", line 121, in load_tts_samples assert len(meta_data_train) > 0, f" [!] No training samples found in {root_path}/{meta_file_train}" AssertionError: [!] No training samples found in /tmp/xtts_ft/dataset//tmp/xtts_ft/dataset/metadata_train.csv
Expected behavior
The fine-tuning process should run, without interpretation.
Logs
>> DVAE weights restored from: /tmp/xtts_ft/run/training/XTTS_v2.0_original_model_files/dvae.pth
Traceback (most recent call last):
File "/content/TTS/TTS/demos/xtts_ft_demo/xtts_demo.py", line 284, in train_model
config_path, original_xtts_checkpoint, vocab_file, exp_path, speaker_wav = train_gpt(language, num_epochs, batch_size, grad_acumm, train_csv, eval_csv, output_path=output_path, max_audio_length=max_audio_length)
File "/content/TTS/TTS/demos/xtts_ft_demo/utils/gpt_train.py", line 138, in train_gpt
train_samples, eval_samples = load_tts_samples(
File "/content/TTS/TTS/tts/datasets/__init__.py", line 121, in load_tts_samples
assert len(meta_data_train) > 0, f" [!] No training samples found in {root_path}/{meta_file_train}"
AssertionError: [!] No training samples found in /tmp/xtts_ft/dataset//tmp/xtts_ft/dataset/metadata_train.csv
Environment
{
"CUDA": {
"GPU": [
"Tesla T4"
],
"available": true,
"version": "12.1"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "2.1.0+cu121",
"TTS": "0.20.6",
"numpy": "1.26.2"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.10.12",
"version": "#1 SMP PREEMPT_DYNAMIC Sat Nov 18 15:31:17 UTC 2023"
}
}
Additional context
No response
The same error occurs in Chinese, the data preprocessing function doesn't seem to work with CJK characters.
Alright, does anyone already working on this issue?
This website is also owned by Microsoft. You can give it a try
https://tts.byylook.com/ai/text-to-speech
This error message AssertionError: [!] No training samples found in /tmp/xtts_ft/dataset//tmp/xtts_ft/dataset/metadata_train.csv
happens because the dataset processing
didn't generate any dataset on which the fine-tuning process (next tab) relies.
Your dataset directory should have the following structure after the dataset processing
is done.
where wavs
directory contains all dataset divided into clips and metadata_eval.csv
, metadata_train.csv
maps these clips with their corresponding transcription or text see below where Arabic voices were used.
- Check the quality of the input data. Try to provide high-quality audio files this helps in
data processing
. - Provide more samples of input data.
- If you're using Whisper model to do the
ASR
process. Try a larger version of it.