nmt icon indicating copy to clipboard operation
nmt copied to clipboard

Expected value of BLEU score and perplexity

Open RhythmIIITD opened this issue 6 years ago • 8 comments

Hi, Can someone please tell what is the expected value of BLEU score and perplexity after running the nmt model for converting vi to en and vice versa.

Thanks in advance.

RhythmIIITD avatar Sep 26 '18 13:09 RhythmIIITD

Is the value of BLEU score directly proportional to the number of steps used for training?

RhythmIIITD avatar Sep 26 '18 13:09 RhythmIIITD

Note that this is for vi to en.

python -m nmt.nmt
--src=vi --tgt=en
--vocab_prefix=/tmp/nmt_data/vocab
--train_prefix=/tmp/nmt_data/train
--dev_prefix=/tmp/nmt_data/tst2012
--test_prefix=/tmp/nmt_data/tst2013
--out_dir=/tmp/nmt_model
--num_train_steps=12000
--steps_per_stats=100
--num_layers=2
--num_units=128
--dropout=0.2
--metrics=bleu

Start step 0, lr 1, Mon Oct 8 18:47:27 2018 step 100 lr 1 step-time 1.87s wps 2.97K ppl 1631.77 gN 13.25 bleu 0.00, Mon Oct 8 18:50:34 2018 step 200 lr 1 step-time 1.98s wps 2.86K ppl 540.98 gN 6.85 bleu 0.00, Mon Oct 8 18:53:52 2018 step 300 lr 1 step-time 1.97s wps 2.88K ppl 357.36 gN 4.71 bleu 0.00, Mon Oct 8 18:57:09 2018 step 11600 lr 1 step-time 1.90s wps 2.95K ppl 34.86 gN 3.06 bleu 5.41, Tue Oct 9 01:37:38 2018 step 11700 lr 1 step-time 1.92s wps 2.93K ppl 34.97 gN 3.12 bleu 5.41, Tue Oct 9 01:40:51 2018 step 11800 lr 1 step-time 1.91s wps 2.93K ppl 34.84 gN 3.16 bleu 5.41, Tue Oct 9 01:44:02 2018 step 11900 lr 1 step-time 1.93s wps 2.92K ppl 34.39 gN 3.09 bleu 5.41, Tue Oct 9 01:47:14 2018 step 12000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 bleu 5.41, Tue Oct 9 01:50:29 2018

Final, step 12000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 dev ppl 33.49, dev bleu 5.3, test ppl 38.14, test bleu 4.5, Tue Oct 9 01:53:42 2018

Best bleu, step 11000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 dev ppl 33.26, dev bleu 5.4, test ppl 38.24, test bleu 4.6, Tue Oct 9 01:56:24 201

ranjita-naik avatar Oct 09 '18 03:10 ranjita-naik

Thank you ma'am.

Could you also please tell on which dataset is the BLEU score of value greater than 20 obtained? Thanks in advance.

RhythmIIITD avatar Oct 16 '18 06:10 RhythmIIITD

Furthermore, why is the BLEU score value so less, (around 5)?

RhythmIIITD avatar Oct 16 '18 06:10 RhythmIIITD

note that English-Vietnamese parallel corpus of TED talks contains only 133K sentence pairs. For large scale you can use German-English parallel corpus (4.5M sentence pairs).

ranjita-naik avatar Oct 23 '18 12:10 ranjita-naik

Okay thank you. Apologies for a question again relating to my previous query, Why the number of sentence pairs being less, lower the value of BLEU score so much?

Could you please elaborate on this

Thanks in advance

RhythmIIITD avatar Oct 23 '18 12:10 RhythmIIITD

Note that this is for vi to en.

python -m nmt.nmt --src=vi --tgt=en --vocab_prefix=/tmp/nmt_data/vocab --train_prefix=/tmp/nmt_data/train --dev_prefix=/tmp/nmt_data/tst2012 --test_prefix=/tmp/nmt_data/tst2013 --out_dir=/tmp/nmt_model --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu

Start step 0, lr 1, Mon Oct 8 18:47:27 2018 step 100 lr 1 step-time 1.87s wps 2.97K ppl 1631.77 gN 13.25 bleu 0.00, Mon Oct 8 18:50:34 2018 step 200 lr 1 step-time 1.98s wps 2.86K ppl 540.98 gN 6.85 bleu 0.00, Mon Oct 8 18:53:52 2018 step 300 lr 1 step-time 1.97s wps 2.88K ppl 357.36 gN 4.71 bleu 0.00, Mon Oct 8 18:57:09 2018 step 11600 lr 1 step-time 1.90s wps 2.95K ppl 34.86 gN 3.06 bleu 5.41, Tue Oct 9 01:37:38 2018 step 11700 lr 1 step-time 1.92s wps 2.93K ppl 34.97 gN 3.12 bleu 5.41, Tue Oct 9 01:40:51 2018 step 11800 lr 1 step-time 1.91s wps 2.93K ppl 34.84 gN 3.16 bleu 5.41, Tue Oct 9 01:44:02 2018 step 11900 lr 1 step-time 1.93s wps 2.92K ppl 34.39 gN 3.09 bleu 5.41, Tue Oct 9 01:47:14 2018 step 12000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 bleu 5.41, Tue Oct 9 01:50:29 2018

Final, step 12000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 dev ppl 33.49, dev bleu 5.3, test ppl 38.14, test bleu 4.5, Tue Oct 9 01:53:42 2018

Best bleu, step 11000 lr 1 step-time 1.95s wps 2.91K ppl 35.37 gN 3.17 dev ppl 33.26, dev bleu 5.4, test ppl 38.24, test bleu 4.6, Tue Oct 9 01:56:24 201

@ranjita-naik result is training without attention and default hyper-params. As describe in README.md, for vi - en, with good hyper-params can have result with 26 BLEU score. You can see in screenshot and link attached below.

image https://github.com/tensorflow/nmt#benchmarks

rednam-ntn avatar Jan 18 '19 06:01 rednam-ntn

If you run the code with standard Hparam, you will get >20 Bleu score

python -m nmt.nmt \
    --src=vi --tgt=en \
    --hparams_path=nmt/standard_hparams/iwslt15.json \
    --vocab_prefix=/tmp/nmt_data/vocab  \
    --train_prefix=/tmp/nmt_data/train \
    --dev_prefix=/tmp/nmt_data/tst2012  \
    --test_prefix=/tmp/nmt_data/tst2013 \
    --out_dir=/tmp/nmt_iwslt15 \

bobvo23 avatar Jan 26 '19 08:01 bobvo23