blog icon indicating copy to clipboard operation
blog copied to clipboard

Reported results cant be achieved in Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models

Open zmf0507 opened this issue 4 years ago • 4 comments

@patrickvonplaten I have been trying to achieve a bleu score of 31.7 (as reported in the blog and paper for WMT en->de evaluation) using hugging-face model google/bert2bert_L-24_wmt_en_de but I could only achieve 23.77 on newstest2014 test set . I have kept the beam search config as mentioned in the paper that is num_beams = 4 and length penalty = 0.6 , fixed the max length = 128 as was done during training in paper. I have also used the bleu script mentioned in the footnotes .

Can you please tell what could be missing in this whole process and how can I achieve the similar scores?

zmf0507 avatar Dec 28 '20 09:12 zmf0507

Hey @zmf0507 - I think this is indeed a bug on our side. I'm working on it at the moment, see: https://github.com/huggingface/transformers/issues/9041

patrickvonplaten avatar Dec 28 '20 13:12 patrickvonplaten

ohh!! I hope it gets fixed soon. Anyways, thanks for the information. Please update here when it gets fixed.

zmf0507 avatar Dec 28 '20 13:12 zmf0507

@patrickvonplaten is there any update ?

zmf0507 avatar Feb 14 '21 19:02 zmf0507

Sorry for replying that late!

The problem is that the original code for those translation models is not published so that debugging isn't really possible. The original github can be found here: https://github.com/google-research/google-research/tree/master/bertseq2seq and the pretrained weights here: https://tfhub.dev/google/bertseq2seq/roberta24_bbc/1 in case someone is very motivated to take a deeper look.

patrickvonplaten avatar May 01 '21 16:05 patrickvonplaten