nmt
nmt copied to clipboard
Expected value of BLEU score and perplexity
Hi, Can someone please tell what is the expected value of BLEU score and perplexity after running the nmt model for converting vi to en and vice versa.
Thanks in advance.
Is the value of BLEU score directly proportional to the number of steps used for training?
Note that this is for vi to en.
python -m nmt.nmt
--src=vi --tgt=en
--vocab_prefix=/tmp/nmt_data/vocab
--train_prefix=/tmp/nmt_data/train
--dev_prefix=/tmp/nmt_data/tst2012
--test_prefix=/tmp/nmt_data/tst2013
--out_dir=/tmp/nmt_model
--num_train_steps=12000
--steps_per_stats=100
--num_layers=2
--num_units=128
--dropout=0.2
--metrics=bleu
Start step 0, lr 1, Mon Oct 8 18:47:27 2018 step 100 lr 1 step-time 1.87s wps 2.97K ppl 1631.77 gN 13.25 bleu 0.00, Mon Oct 8 18:50:34 2018 step 200 lr 1 step-time 1.98s wps 2.86K ppl 540.98 gN 6.85 bleu 0.00, Mon Oct 8 18:53:52 2018 step 300 lr 1 step-time 1.97s wps 2.88K ppl 357.36 gN 4.71 bleu 0.00, Mon Oct 8 18:57:09 2018 step 11600 lr 1 step-time 1.90s wps 2.95K ppl 34.86 gN 3.06 bleu 5.41, Tue Oct 9 01:37:38 2018 step 11700 lr 1 step-time 1.92s wps 2.93K ppl 34.97 gN 3.12 bleu 5.41, Tue Oct 9 01:40:51 2018 step 11800 lr 1 step-time 1.91s wps 2.93K ppl 34.84 gN 3.16 bleu 5.41, Tue Oct 9 01:44:02 2018 step 11900 lr 1 step-time 1.93s wps 2.92K ppl 34.39 gN 3.09 bleu 5.41, Tue Oct 9 01:47:14 2018 step 12000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 bleu 5.41, Tue Oct 9 01:50:29 2018
Final, step 12000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 dev ppl 33.49, dev bleu 5.3, test ppl 38.14, test bleu 4.5, Tue Oct 9 01:53:42 2018
Best bleu, step 11000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 dev ppl 33.26, dev bleu 5.4, test ppl 38.24, test bleu 4.6, Tue Oct 9 01:56:24 201
Thank you ma'am.
Could you also please tell on which dataset is the BLEU score of value greater than 20 obtained? Thanks in advance.
Furthermore, why is the BLEU score value so less, (around 5)?
note that English-Vietnamese parallel corpus of TED talks contains only 133K sentence pairs. For large scale you can use German-English parallel corpus (4.5M sentence pairs).
Okay thank you. Apologies for a question again relating to my previous query, Why the number of sentence pairs being less, lower the value of BLEU score so much?
Could you please elaborate on this
Thanks in advance
Note that this is for vi to en.
python -m nmt.nmt --src=vi --tgt=en --vocab_prefix=/tmp/nmt_data/vocab --train_prefix=/tmp/nmt_data/train --dev_prefix=/tmp/nmt_data/tst2012 --test_prefix=/tmp/nmt_data/tst2013 --out_dir=/tmp/nmt_model --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu
Start step 0, lr 1, Mon Oct 8 18:47:27 2018 step 100 lr 1 step-time 1.87s wps 2.97K ppl 1631.77 gN 13.25 bleu 0.00, Mon Oct 8 18:50:34 2018 step 200 lr 1 step-time 1.98s wps 2.86K ppl 540.98 gN 6.85 bleu 0.00, Mon Oct 8 18:53:52 2018 step 300 lr 1 step-time 1.97s wps 2.88K ppl 357.36 gN 4.71 bleu 0.00, Mon Oct 8 18:57:09 2018 step 11600 lr 1 step-time 1.90s wps 2.95K ppl 34.86 gN 3.06 bleu 5.41, Tue Oct 9 01:37:38 2018 step 11700 lr 1 step-time 1.92s wps 2.93K ppl 34.97 gN 3.12 bleu 5.41, Tue Oct 9 01:40:51 2018 step 11800 lr 1 step-time 1.91s wps 2.93K ppl 34.84 gN 3.16 bleu 5.41, Tue Oct 9 01:44:02 2018 step 11900 lr 1 step-time 1.93s wps 2.92K ppl 34.39 gN 3.09 bleu 5.41, Tue Oct 9 01:47:14 2018 step 12000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 bleu 5.41, Tue Oct 9 01:50:29 2018
Final, step 12000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 dev ppl 33.49, dev bleu 5.3, test ppl 38.14, test bleu 4.5, Tue Oct 9 01:53:42 2018
Best bleu, step 11000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 dev ppl 33.26, dev bleu 5.4, test ppl 38.24, test bleu 4.6, Tue Oct 9 01:56:24 201
@ranjita-naik result is training without attention and default hyper-params. As describe in README.md, for vi - en, with good hyper-params can have result with 26 BLEU score. You can see in screenshot and link attached below.
https://github.com/tensorflow/nmt#benchmarks
If you run the code with standard Hparam, you will get >20 Bleu score
python -m nmt.nmt \
--src=vi --tgt=en \
--hparams_path=nmt/standard_hparams/iwslt15.json \
--vocab_prefix=/tmp/nmt_data/vocab \
--train_prefix=/tmp/nmt_data/train \
--dev_prefix=/tmp/nmt_data/tst2012 \
--test_prefix=/tmp/nmt_data/tst2013 \
--out_dir=/tmp/nmt_iwslt15 \