translate
translate copied to clipboard
Unk replacement doesn't work with the transformer
I get the following error when decoding using the transformer with --replace-unk:
Traceback (most recent call last):
File "../../../generate.py", line 569, in <module>
main()
File "../../../generate.py", line 474, in main
generate(args)
File "../../../generate.py", line 555, in generate
models=models, args=args, task=task, dataset_split=args.gen_subset
File "../../../generate.py", line 142, in _generate_score
args, task, dataset_split, translations, align_dict
File "../../../generate.py", line 220, in _iter_first_best_bilingual
remove_bpe=args.remove_bpe,
File "/home/pmichel1/.local/lib/python3.6/site-packages/fairseq-0.5.0-py3.6-linux-x86_64.egg/fairseq/utils.py", line 293, in post_process_prediction
hypo_str = replace_unk(hypo_str, src_str, alignment, align_dict, tgt_dict.unk_string())
File "/home/pmichel1/.local/lib/python3.6/site-packages/fairseq-0.5.0-py3.6-linux-x86_64.egg/fairseq/utils.py", line 283, in replace_unk
src_token = src_tokens[alignment[i]]
IndexError: list index out of range
After investigating a bit on this I can make an educated guess on the reason: There is a possibility that the attention weights returned by the transformerDecoderLayer are not masked properly.
The error goes away when I clamp the alignment dictionary to the source sentence length but this is just a bandaid.
@pmichel31415 @jhcross Is this still an issue?
It should be fixed by #226 which hasn't been merged yet