tensor2tensor How can I see approx_bleu on validation set?

Hello , I use T2T for translation task, it's version is 1.5.5.

The setting I use as follows: PROBLEM=translate_enzh_wmt32k MODEL=transformer HPARAMS=transformer_base_single_gpu

I used t2t-trainer.py to train a model. When it evaluate on validation set, it output the information: "loss = 8.52209, metrics-translate_enzh_wmt32k/neg_log_perplexity = -9.75649"

When I rerun t2t-trainer.py,It outputs the information about loss and accuracy(or any metrics else) when evaluate on validation set ,why?

It output the matrics randomly ? How can I see approx_bleu on validation set when evaluate ?

Mar 13 '18 12:03 jiangbojian

I am not sure what is your main question (problem). Use tensorboard and possibly also t2t-bleu, see e.g. #587

Mar 13 '18 13:03 martinpopel

@martinpopel
Hello,t2t-bleu use to get real bleu. But approx_bleu is computed by bleu_score function in bleu_hook.py

Mar 13 '18 13:03 jiangbojian

Yes. (Both t2t-bleu and approx_bleu use the same code in bleu_hook.py, but approx_bleu applies it on subwords instead of words and with cheating by looking at the previous word from the reference translation - i.e. not autoregressively, unless --eval_run_autoregressive is used.)

Mar 13 '18 13:03 martinpopel

I used to train translation model with T2T(version is 1.0.14). When it began to evaluate on validation set,the log would output the information like this:INFO:tensorflow:Saving dict for global step 9704: global_step = 9704, loss = 4.03075, metrics-wmt_zhen_tokens_32k/accuracy = 0.40701, metrics-wmt_zhen_tokens_32k/accuracy_per_sequence = 0.0, metrics-wmt_zhen_tokens_32k/accuracy_top5 = 0.656632, metrics-wmt_zhen_tokens_32k/approx_bleu_score = 0.120866, metrics-wmt_zhen_tokens_32k/neg_log_perplexity = -3.29166, metrics/accuracy = 0.40701, metrics/accuracy_per_sequence = 0.0, metrics/accuracy_top5 = 0.656632, metrics/approx_bleu_score = 0.120866, metrics/neg_log_perplexity = -3.29166

But,when I use T2T(version is 1.5.5) the information output during evaluating like this:[2018-03-13 19:40:59,878] Saving dict for global step 72002: global_step = 72002, loss = 2.13578, metrics-translate_enzh_wmt32k/accuracy_per_sequence = 0.00976631

where is approx_bleu now?

Mar 13 '18 13:03 jiangbojian

I can confirm that approx_bleu is now missing.

Mar 19 '18 18:03 stefan-it

@stefan-it It's a bug and I have fixed it now.

Mar 20 '18 03:03 jiangbojian

I can confirm that approx_bleu is shown in the latest version of tensor2tensor, so @jiangbojian could close this issue here :)

Apr 24 '18 22:04 stefan-it

Hello, my problem is also enzh and i use transformer_base_single_gpu. However, i use my own dataset and the size of dataset is about 600W. My issue is approx_blue_score is about 20.x, but when i run t2t-bleu, i found the bleu-uncased and bleu-cased are both 5.x. I dont understand why there are huge different between approx_blue_score and bleu-uncased/bleu-uncased. Thank you~

Aug 27 '18 11:08 ttwelve12

t2t-bleu is not suitable for Chinese (as the target language). Use sacrebleu --tok zh instead. See https://github.com/awslabs/sockeye/tree/master/contrib/sacrebleu

Sep 05 '18 11:09 martinpopel

Simple question - what script should we run to the get the approx_bleu for a given checkpoint (I'm asking because I want to compare the quality of a single checkpoint against an averaged checkpoint)

Oct 11 '20 18:10 ndvbd