seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

Finetuning TTS, training crashes after eval

Open R4ZZ3 opened this issue 2 years ago • 4 comments

I try to finetune model for TTS. I have myself prepared train_manifest.json and eval_manifest.json files with coding instead of using the datasets scripts. My train_manifest.json should look like the same as which is formed by using the provided loading script: !m4t_prepare_dataset --source_lang fin --target_lang fin --split validation --save_dir ./m4t_sample_dataset

Sample of train_manifest.json {"source": {"id": 0, "lang": "fin", "text": "Terve! Minä olen Päkä ja tämä on minun joulukalenterini.", "audio_local_path": "datasets/out_files/LASTENKIRKON_JOULUKALENTERI_712_0.wav", "waveform": null, "sampling_rate": 16000, "units": null}, "target": {"id": 0, "lang": "fin", "text": "Terve! Minä olen Päkä ja tämä on minun joulukalenterini.", "audio_local_path": "datasets/out_files/LASTENKIRKON_JOULUKALENTERI_712_0.wav", "waveform": null, "sampling_rate": 16000, "units": null}} {"source": {"id": 1, "lang": "fin", "text": "Ystäväni Pulmu lensi omille teilleen juuri joulun kynnyksellä.", "audio_local_path": "datasets/out_files/LASTENKIRKON_JOULUKALENTERI_712_1.wav", "waveform": null, "sampling_rate": 16000, "units": null}, "target": {"id": 1, "lang": "fin", "text": "Ystäväni Pulmu lensi omille teilleen juuri joulun kynnyksellä.", "audio_local_path": "datasets/out_files/LASTENKIRKON_JOULUKALENTERI_712_1.wav", "waveform": null, "sampling_rate": 16000, "units": null}}

However when I try to finetune it runs eval first (Seems to run it without problems and then when it tries to go back to finetuning it crashes: image

Any idea what could be wrong? Is the datasets somehow wrong? My files should be 16Khz mono.

R4ZZ3 avatar Sep 27 '23 20:09 R4ZZ3

Any ideas? Has there someone been able to finetune the TTS model with self prepared train_manifest.json?

R4ZZ3 avatar Oct 05 '23 08:10 R4ZZ3

Where is the m4t_prepare_dataset scipts in the github, I can't find it, can you tell me?

yiwei0730 avatar Nov 10 '23 07:11 yiwei0730

It is defined here https://github.com/facebookresearch/seamless_communication/tree/main/scripts/m4t/finetune

R4ZZ3 avatar Nov 10 '23 09:11 R4ZZ3

@R4ZZ3 which mode are you using? If you don't have units, then you can only do SPEECH_TO_TEXT.

mavlyutovr avatar Jan 17 '24 20:01 mavlyutovr