LV_groundhog
LV_groundhog copied to clipboard
No Appropriate Translation
Hello everyone,
We're trying to use this for an NMT system and apply preprocessing/postprocessing techniques to reduce the UNK issue.
We have trained a model, but we're getting the following output when translating different test sentences, "No Appropriate Translation"
We have ran the following commands on a large En-Fr parallel corpus, and after the necessary commands for data preparation:
train.py --proto=prototype_encdec_state
sample.py --source=english.txt --beam-search --beam-size 10 --trans encdec_trans.txt --state encdec_state.pkl encdec_model.npz
Any idea what's the problem?
This will happen if the beam search doesn't terminate. This has been a rare occurence for me (less than once for 1000 sentences or so when the model is trained). Basically, the system may wrongly start to repeatedly predict some word (e.g. , , , , , , , , or UNK UNK UNK UNK ...), and never output the EOS token.
You could try increasing the beam size when this happens (as in fast_sample.py
and https://github.com/lisa-groundhog/GroundHog/blob/master/experiments/nmt/sample.py).
If this doesn't happen too often, you may leave the translation empty, or maybe fall back on another system. A more satisfying approach might be to use scheduled sampling (http://arxiv.org/abs/1506.03099), so that what the system sees during decoding is closer to what it has seen during training.
Finally, I only used the attention-based model here (both with and without LV), so it is possible that some changes I made broke the basic encoder-decoder.