lightseq Model trained by lightseq perform worse than model trained by fairseq

Model trained by lightseq perform worse than model trained by fairseq

Open alayamanas opened this issue 3 years ago • 2 comments

Machine translation English to Chinese, I use the same data, almost the same parms ( only the following exceptions: lightseq --arch ls_transformer --optimizer ls_adam --criterion ls_label_smoothed_cross_entropy; fairseq --arch transformer --optimizer adam --criterion label_smoothed_cross_entropy) But I found that the performence of the lightseq model was worse than fairseq model. lightseq model somtimes produce repeated words again and again, but fairseq model works fine.

What are the possible reasons?

Sep 26 '21 13:09 alayamanas

Can you reproduce our result on the wmt14 en-de dataset on your hardware and environment？https://github.com/bytedance/lightseq/blob/master/examples/training/fairseq/ls_fairseq_wmt14en2de.sh

Sep 27 '21 08:09 neopro12

Thanks for your reply, I'll have a try

Sep 27 '21 14:09 alayamanas

lightseq lightseq copied to clipboard

Model trained by lightseq perform worse than model trained by fairseq

lightseq
lightseq copied to clipboard