Finetuning TTS, training crashes after eval
I try to finetune model for TTS. I have myself prepared train_manifest.json and eval_manifest.json files with coding instead of using the datasets scripts. My train_manifest.json should look like the same as which is formed by using the provided loading script: !m4t_prepare_dataset --source_lang fin --target_lang fin --split validation --save_dir ./m4t_sample_dataset
Sample of train_manifest.json
{"source": {"id": 0, "lang": "fin", "text": "Terve! Minä olen Päkä ja tämä on minun joulukalenterini.", "audio_local_path": "datasets/out_files/LASTENKIRKON_JOULUKALENTERI_712_0.wav", "waveform": null, "sampling_rate": 16000, "units": null}, "target": {"id": 0, "lang": "fin", "text": "Terve! Minä olen Päkä ja tämä on minun joulukalenterini.", "audio_local_path": "datasets/out_files/LASTENKIRKON_JOULUKALENTERI_712_0.wav", "waveform": null, "sampling_rate": 16000, "units": null}} {"source": {"id": 1, "lang": "fin", "text": "Ystäväni Pulmu lensi omille teilleen juuri joulun kynnyksellä.", "audio_local_path": "datasets/out_files/LASTENKIRKON_JOULUKALENTERI_712_1.wav", "waveform": null, "sampling_rate": 16000, "units": null}, "target": {"id": 1, "lang": "fin", "text": "Ystäväni Pulmu lensi omille teilleen juuri joulun kynnyksellä.", "audio_local_path": "datasets/out_files/LASTENKIRKON_JOULUKALENTERI_712_1.wav", "waveform": null, "sampling_rate": 16000, "units": null}}
However when I try to finetune it runs eval first (Seems to run it without problems and then when it tries to go back to finetuning it crashes:
Any idea what could be wrong? Is the datasets somehow wrong? My files should be 16Khz mono.
Any ideas? Has there someone been able to finetune the TTS model with self prepared train_manifest.json?
Where is the m4t_prepare_dataset scipts in the github, I can't find it, can you tell me?
It is defined here https://github.com/facebookresearch/seamless_communication/tree/main/scripts/m4t/finetune
@R4ZZ3 which mode are you using? If you don't have units, then you can only do SPEECH_TO_TEXT.