LV_groundhog No Appropriate Translation

No Appropriate Translation

Open tareksakakini opened this issue 9 years ago • 1 comments

Hello everyone,

We're trying to use this for an NMT system and apply preprocessing/postprocessing techniques to reduce the UNK issue.

We have trained a model, but we're getting the following output when translating different test sentences, "No Appropriate Translation"

We have ran the following commands on a large En-Fr parallel corpus, and after the necessary commands for data preparation:

train.py --proto=prototype_encdec_state

sample.py --source=english.txt --beam-search --beam-size 10 --trans encdec_trans.txt --state encdec_state.pkl encdec_model.npz

Any idea what's the problem?

Nov 24 '15 17:11 tareksakakini

This will happen if the beam search doesn't terminate. This has been a rare occurence for me (less than once for 1000 sentences or so when the model is trained). Basically, the system may wrongly start to repeatedly predict some word (e.g. , , , , , , , , or UNK UNK UNK UNK ...), and never output the EOS token.

You could try increasing the beam size when this happens (as in fast_sample.py and https://github.com/lisa-groundhog/GroundHog/blob/master/experiments/nmt/sample.py).

If this doesn't happen too often, you may leave the translation empty, or maybe fall back on another system. A more satisfying approach might be to use scheduled sampling (http://arxiv.org/abs/1506.03099), so that what the system sees during decoding is closer to what it has seen during training.

Finally, I only used the attention-based model here (both with and without LV), so it is possible that some changes I made broke the basic encoder-decoder.

Nov 24 '15 17:11 sebastien-j

LV_groundhog LV_groundhog copied to clipboard

No Appropriate Translation

LV_groundhog
LV_groundhog copied to clipboard