translate icon indicating copy to clipboard operation
translate copied to clipboard

Unk replacement doesn't work with the transformer

Open pmichel31415 opened this issue 7 years ago • 3 comments

I get the following error when decoding using the transformer with --replace-unk:

Traceback (most recent call last):
  File "../../../generate.py", line 569, in <module>
    main()
  File "../../../generate.py", line 474, in main
    generate(args)
  File "../../../generate.py", line 555, in generate
    models=models, args=args, task=task, dataset_split=args.gen_subset
  File "../../../generate.py", line 142, in _generate_score
    args, task, dataset_split, translations, align_dict
  File "../../../generate.py", line 220, in _iter_first_best_bilingual
    remove_bpe=args.remove_bpe,
  File "/home/pmichel1/.local/lib/python3.6/site-packages/fairseq-0.5.0-py3.6-linux-x86_64.egg/fairseq/utils.py", line 293, in post_process_prediction
    hypo_str = replace_unk(hypo_str, src_str, alignment, align_dict, tgt_dict.unk_string())
  File "/home/pmichel1/.local/lib/python3.6/site-packages/fairseq-0.5.0-py3.6-linux-x86_64.egg/fairseq/utils.py", line 283, in replace_unk
    src_token = src_tokens[alignment[i]]
IndexError: list index out of range

pmichel31415 avatar Sep 25 '18 16:09 pmichel31415

After investigating a bit on this I can make an educated guess on the reason: There is a possibility that the attention weights returned by the transformerDecoderLayer are not masked properly.

The error goes away when I clamp the alignment dictionary to the source sentence length but this is just a bandaid.

pmichel31415 avatar Sep 25 '18 18:09 pmichel31415

@pmichel31415 @jhcross Is this still an issue?

liezl200 avatar Nov 06 '18 15:11 liezl200

It should be fixed by #226 which hasn't been merged yet

pmichel31415 avatar Nov 06 '18 15:11 pmichel31415