PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

Add preprocess & capability for custom dataset for mt

Open FrostML opened this issue 3 years ago β€’ 1 comments

PR types

New features

PR changes

Others

Description

Add preprocess & capability for custom dataset for mt.

FrostML avatar Sep 14 '22 10:09 FrostML

To verify the correctness of our changes, could you reproduce the experimental results on the IWSLT14 de->en dataset by following the tutorial?

We could add a new config file (transformer.iwslt14.yaml) to exactly match the model (trainsformer_iwslt_de_en) and training configurations used in the fairseq example. Since we (fairseq and paddlenlp) share the same preprocessing script (prepare-iwslt14.sh), we should achieve similiar results.

We could run this experiment on a single GPU, and it won't take too much time to converge. The final BLEU score should be around 34.85.

gpengzhi avatar Sep 21 '22 04:09 gpengzhi

Do we plan to merge this into master branch?

gpengzhi avatar Nov 16 '22 12:11 gpengzhi

@guoshengCS

FrostML avatar Nov 24 '22 17:11 FrostML

@guoshengCS

FrostML avatar Nov 30 '22 04:11 FrostML