Add preprocess & capability for custom dataset for mt
PR types
New features
PR changes
Others
Description
Add preprocess & capability for custom dataset for mt.
To verify the correctness of our changes, could you reproduce the experimental results on the IWSLT14 de->en dataset by following the tutorial?
We could add a new config file (transformer.iwslt14.yaml) to exactly match the model (trainsformer_iwslt_de_en) and training configurations used in the fairseq example. Since we (fairseq and paddlenlp) share the same preprocessing script (prepare-iwslt14.sh), we should achieve similiar results.
We could run this experiment on a single GPU, and it won't take too much time to converge. The final BLEU score should be around 34.85.
Do we plan to merge this into master branch?
@guoshengCS
@guoshengCS