fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

how to prepare multitask data in speech to speech

Open Saravinnai opened this issue 2 years ago • 1 comments

I am training a speech-to-speech translation for French to English lang. Could anyone pls help me to do multitask training.
Followed the below link to do multitask training (https://github.com/facebookresearch/fairseq/blob/100cd91db19bb27277a06a25eb4154c805b10189/examples/speech_to_speech/docs/direct_s2st_discrete_units.md)

Used French text to generate source letter task dict.txt and manifest file and English text to generate target letter and decode target ctc task dict.txt and manifest file to train a model but the loss looks bad.

source_letter: # $TASK_NAME decoder_type: transformer dict: ${DATA_ROOT}/source_letter/dict.txt data: ${DATA_ROOT}/source_letter encoder_layer: 6 loss_weight: 8.0 target_letter: decoder_type: transformer dict: ${DATA_ROOT}/target_letter/dict.txt data: ${DATA_ROOT}/target_letter encoder_layer: 8 loss_weight: 8.0 decoder_target_ctc: decoder_type: ctc dict: ${DATA_ROOT}/decoder_target_ctc/dict.txt data: ${DATA_ROOT}/decoder_target_ctc decoder_layer: 3 loss_weight: 1.6

I'm not sure whether I prepared this correctly. Could anyone pls conform on this and if anyting is wrong pls help me to fix it correctly. Thanks

Saravinnai avatar Jul 25 '23 11:07 Saravinnai

How did you prepare the .tsv files? Do you have any script? How did you generate the tokens in ${DATA_ROOT}/${TASK_NAME}/${SPLIT}.tsv files?

WizardDutta avatar Jul 28 '25 17:07 WizardDutta