fairseq
fairseq copied to clipboard
how to prepare multitask data in speech to speech
I am training a speech-to-speech translation for French to English lang.
Could anyone pls help me to do multitask training.
Followed the below link to do multitask training (https://github.com/facebookresearch/fairseq/blob/100cd91db19bb27277a06a25eb4154c805b10189/examples/speech_to_speech/docs/direct_s2st_discrete_units.md)
Used French text to generate source letter task dict.txt and manifest file and English text to generate target letter and decode target ctc task dict.txt and manifest file to train a model but the loss looks bad.
source_letter: # $TASK_NAME decoder_type: transformer dict: ${DATA_ROOT}/source_letter/dict.txt data: ${DATA_ROOT}/source_letter encoder_layer: 6 loss_weight: 8.0 target_letter: decoder_type: transformer dict: ${DATA_ROOT}/target_letter/dict.txt data: ${DATA_ROOT}/target_letter encoder_layer: 8 loss_weight: 8.0 decoder_target_ctc: decoder_type: ctc dict: ${DATA_ROOT}/decoder_target_ctc/dict.txt data: ${DATA_ROOT}/decoder_target_ctc decoder_layer: 3 loss_weight: 1.6
I'm not sure whether I prepared this correctly. Could anyone pls conform on this and if anyting is wrong pls help me to fix it correctly. Thanks
How did you prepare the .tsv files? Do you have any script? How did you generate the tokens in ${DATA_ROOT}/${TASK_NAME}/${SPLIT}.tsv files?