OpenNMT-py icon indicating copy to clipboard operation
OpenNMT-py copied to clipboard

How to add Bert embedding to OpenNMT seq2seq model

Open dnaihao opened this issue 3 years ago • 2 comments

(https://forum.opennmt.net/t/how-to-use-bert-embedding-into-opennmt-seq2seq-model/4430/5)

Originally I tried the seq2seq model (Glove embedding + RNN encoder-decoder + copy generator) on Text2SQL task by OpenNMT, everything works perfectly fine. I can get an accuracy of ~60% on the GeoQuery benchmark, the cross-entropy on the training set will drop to as low as 0.10, and accuracy on training will be something > 90% (token level accuracy).

When I add Bert encoder and replace the Glove embedding with the last layer output of Bert on the encoder, the model seems to learn nothing during training. The token level accuracy in training cannot reach 90%, and the cross-entropy will remain something like 0.3. During inference, the model predicts unreasonable SQL results and can barely achieve 1% accuracy on the testing set.

I have investigated this issue for quite a long time, I double-checked my optimizer, and I use different optimizer (Adam with a learning rate of 1e-3 for parameters in my LSTM part, BertAdam with a learning rate of 1e-5 for Bert part). For the encoding part, I directly copy codes from a published Github Repo.

I could not come up with any other places that my code might go wrong. Any help will be much appreciated!

Here is the training information for the original LSTM seq2seq model image

Here is the training information for Bert + seq2seq model image

Here is the SQL prediction for the original Seq2seq model. We can see there are variations in lengths of SQL predictions, and variations in values the model predict in each SQL image

Here is what Bert + seq2seq predicts. Not only it fails to predict the long SQL (1-3) compared to the original seq2seq, it also predicts the same value over and over again (15 - 25) for different questions. This looks really weird to me. Any ideas? image

dnaihao avatar May 11 '21 15:05 dnaihao

Hi Have you solved this problem? I am facing the same issue that I use BERT as an encoder, the performance drops significantly.

zhuang-li avatar Jul 31 '21 19:07 zhuang-li

No, sorry. It is a bit weird to me since I am pretty sure every place I changed is consistent with prior works.

dnaihao avatar Aug 02 '21 20:08 dnaihao