OpenNMT-py icon indicating copy to clipboard operation
OpenNMT-py copied to clipboard

Word features in Translation

Open vikrant97 opened this issue 4 years ago • 19 comments

I am trying to use word level features based on the paper "Linguistic Input Features Improve Neural Machine Translation", but couldn't find any correct documentation on how to use it. I want to use different feature vector sizes for different features. Can anyone help me out?

vikrant97 avatar Aug 20 '19 09:08 vikrant97

I am getting this error while training. Traning command: python train.py -data ../data/word_with_pos_only -save_model ../models/word_with_pos_only_emb12 -layers 6 -rnn_size 512 -word_vec_size 500 -transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 150000 -max_generator_batches 2 -dropout 0.3 -batch_size 4096 -batch_type tokens -normalization tokens -accum_count 2 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 1 -max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 -valid_steps 5000 -save_checkpoint_steps 20000 -world_size 1 -gpu_ranks 0

example: महानगर│NN पालिका│NNPC अंतर्गत│JJ दत्तात्रय│NNPC नगर│NNPC माध्यमिक│NNPC स्कूल│NN के│PSP विद्यार्थियों│NN ने│PSP काल्पनिक│JJ किला│NN दत्तगढ़│NNP बनाकर│VM अपनी│PRP कल्पनाशक्ति│NN का│PSP परिचय│NN दिया│VM

Traceback (most recent call last): File "/home/vikrant.goyal/OpenNMT-py/train.py", line 109, in main(opt) File "/home/vikrant.goyal/OpenNMT-py/train.py", line 39, in main single_main(opt, 0) File "/home/vikrant.goyal/OpenNMT-py/onmt/train_single.py", line 127, in main valid_steps=opt.valid_steps) File "/home/vikrant.goyal/OpenNMT-py/onmt/trainer.py", line 249, in train report_stats) File "/home/vikrant.goyal/OpenNMT-py/onmt/trainer.py", line 364, in _gradient_accumulation outputs, attns = self.model(src, tgt, src_lengths, bptt=bptt) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/vikrant.goyal/OpenNMT-py/onmt/models/model.py", line 46, in forward memory_lengths=lengths) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/vikrant.goyal/OpenNMT-py/onmt/decoders/transformer.py", line 215, in forward step=step) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/vikrant.goyal/OpenNMT-py/onmt/decoders/transformer.py", line 69, in forward input_norm = self.layer_norm_1(inputs) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, **kwargs) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/normalization.py", line 157, in forward input, self.normalized_shape, self.weight, self.bias, self.eps) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/functional.py", line 1725, in layer_norm torch.backends.cudnn.enabled) RuntimeError: Given normalized_shape=[512], expected input with shape [, 512], but got input of size[227, 16, 500]

Edit: It works if i use default options of train.py but not with the above mentioned command python train.py -data ../data/word_with_pos_only -save_model ../models/word_with_pos_only -save_checkpoint_steps 20000 -world_size 1 -gpu_ranks 0

vikrant97 avatar Aug 21 '19 06:08 vikrant97

I figured out that if I use rnn_size to be equal to the total word_vec_size (i.e including feature dimensions), then it works. But the problem is I have to use different feature vec size for different features & the above contraint stop me doing that. @vince62s Can you help me out in this?

vikrant97 avatar Aug 22 '19 07:08 vikrant97

@vikrant97 your request is unclear to me. each feature must have a constant vec size. eg: if you have two features F1 and F2,then each example must have the same vec size for its F1 and same for F2. However F1 vec size can be different from F2 vec size. are you asking something else?

vince62s avatar Aug 23 '19 10:08 vince62s

@vince62s Hi, I have two problems:-

  1. When using transformer model, if I choose the word_vec_size to be different than rnn_size, then it throws error. For example, use rnn_size=512 & word_vec_size=500, it will throw some error saying that it expects the word vec dimension to be equal to 512.

  2. There is no option to specify different vec size for two different features say F1 & F2. The toolkit has a parameter -feat_vec_size which is a "int" and it takes only 1 value & use this as default vec size for both the features say F1 & F2 with feat_merge operation being "concat". I am saying it should take a list of feature_vec_sizes for different features F1, F2, F3....

P.S I edited the code to incorporate a list of feature_vec_sizes for different features F1, F2, F3... but it throws some error with a conflicting issue with rnn_size.

I hope my issue is clear? If yes, can you please help me out it in these two problems?

vikrant97 avatar Aug 23 '19 14:08 vikrant97

ok, then what is the issue of setting for instance emb_size 500 feat_vec_size 6, hidden 512 and if you have 2 features, it should work fine it is not a big deal to set the feat_vec_size to the highest of what you need, is it ?

vince62s avatar Aug 23 '19 19:08 vince62s

@vince62s Yes that works fine because the total vec_size will come out to be 512 only (Therefore no conflicting issues with rrn_size). But the actual problem is: For example, if you want to use F1 (POS tag) which has a max vocab size of 50 & another feature F2 (say lemma) which has vocab size of 50k. Using the same embedding_size for both the features would not be appropriate. I would say an embedding_size of 10 would would be appropriate for F1 & F2 should have embeddings_size close to that of a word.

So I guess the trick you suggested will not work here or am I missing something?

vikrant97 avatar Aug 24 '19 07:08 vikrant97

ok I see, what you are asking is in fact a duplicate of this: https://github.com/OpenNMT/OpenNMT-py/issues/344

@bpopeters since you started to work onit, can you give him some pointers so that he can submit a PR ? Thanks

vince62s avatar Aug 24 '19 08:08 vince62s

Hi Vikrant,

The words’ features are back?

De: Vikrant Goyal [mailto:[email protected]] Enviada em: quarta-feira, 21 de agosto de 2019 03:40 Para: OpenNMT/OpenNMT-py Cc: Subscribed Assunto: Re: [OpenNMT/OpenNMT-py] Word features in Translation (#1534)

I am getting this error while training. Traning command: python train.py -data ../data/word_with_pos_only -save_model ../models/word_with_pos_only_emb12 -layers 6 -rnn_size 512 -word_vec_size 500 -transformer_ff 2048 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 150000 -max_generator_batches 2 -dropout 0.3 -batch_size 4096 -batch_type tokens -normalization tokens -accum_count 2 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 1 -max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 -valid_steps 5000 -save_checkpoint_steps 20000 -world_size 1 -gpu_ranks 0

example: महानगर│NN पालिका│NNPC अंतर्गत│JJ दत्तात्रय│NNPC नगर│NNPC माध्यमिक│NNPC स्कूल│NN के│PSP विद्यार्थियों│NN ने│PSP काल्पनिक│JJ किला│NN दत्तगढ़│NNP बनाकर│VM अपनी│PRP कल्पनाशक्ति│NN का│PSP परिचय│NN दिया│VM

Traceback (most recent call last): File "/home/vikrant.goyal/OpenNMT-py/train.py", line 109, in main(opt) File "/home/vikrant.goyal/OpenNMT-py/train.py", line 39, in main single_main(opt, 0) File "/home/vikrant.goyal/OpenNMT-py/onmt/train_single.py", line 127, in main valid_steps=opt.valid_steps) File "/home/vikrant.goyal/OpenNMT-py/onmt/trainer.py", line 249, in train report_stats) File "/home/vikrant.goyal/OpenNMT-py/onmt/trainer.py", line 364, in _gradient_accumulation outputs, attns = self.model(src, tgt, src_lengths, bptt=bptt) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/vikrant.goyal/OpenNMT-py/onmt/models/model.py", line 46, in forward memory_lengths=lengths) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/vikrant.goyal/OpenNMT-py/onmt/decoders/transformer.py", line 215, in forward step=step) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/vikrant.goyal/OpenNMT-py/onmt/decoders/transformer.py", line 69, in forward input_norm = self.layer_norm_1(inputs) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, **kwargs) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/modules/normalization.py", line 157, in forward input, self.normalized_shape, self.weight, self.bias, self.eps) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/nn/functional.py", line 1725, in layer_norm torch.backends.cudnn.enabled) RuntimeError: Given normalized_shape=[512], expected input with shape [, 512], but got input of size[227, 16, 500]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/OpenNMT/OpenNMT-py/issues/1534?email_source=notifications&email_token=AFTBYL2ZMTJQIOZPVBSGJFTQFTPNXA5CNFSM4INS4EN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4YTRAQ#issuecomment-523319426, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFTBYLZ3M6Q26BR26HMTXXDQFTPNXANCNFSM4INS4ENQ.

eduamf avatar Aug 28 '19 23:08 eduamf

Hi @eduamf The "word features" thing works if you want same feature vec size for all features.

vikrant97 avatar Aug 31 '19 20:08 vikrant97

@vince62s I have updated (locally) the OpenNMT code to incorporate word features on the source side. It trains without any errors but fails on testing. The error seems to occur in loading the trained model. Can you please help me out with this error?

Traceback (most recent call last): File "translate.py", line 48, in main(opt) File "translate.py", line 19, in main translator = build_translator(opt, report_score=True) File "/home/vikrant.goyal/OpenNMT-py/onmt/translate/translator.py", line 28, in build_translator fields, model, model_opt = load_test_model(opt) File "/home/vikrant.goyal/OpenNMT-py/onmt/model_builder.py", line 85, in load_test_model map_location=lambda storage, loc: storage) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/serialization.py", line 387, in load return _load(f, map_location, pickle_module, **pickle_load_args) File "/home/vikrant.goyal/OpenNMT-py/myenv/lib/python3.5/site-packages/torch/serialization.py", line 564, in _load magic_number = pickle_module.load(f, **pickle_load_args) EOFError: Ran out of input

vikrant97 avatar Sep 16 '19 10:09 vikrant97

If you want to discuss some of your code, please open e [WIP] PR and then we cansse what is going right and wrong. Without knowing what you have done, difficult to help.

Most likely there could be an issue with your input features at inference.

vince62s avatar Sep 16 '19 10:09 vince62s

@vince62s The issue is solved & is working on my end. Please check the PR that I have submitted.

After this I will also look into target side features if that works and also limiting the feature vocab sizes to some particular threshold (becoz that's needed for features like lemma). Thanks!

vikrant97 avatar Sep 16 '19 15:09 vikrant97

Based on what's written here I have a model with 4 features. So I set feat vec size to be all the same. But I still get an error.

--rnn_size 512 \
--word_vec_size 384 \
--feat_vec_size 32 \

RuntimeError: Given normalized_shape=[512], expected input with shape [*, 512], but got input of size[81, 25, 384]

When I run with regular word vector size I also get the error

--rnn_size 512 \
--word_vec_size 512 \
--feat_vec_size 32 \

RuntimeError: Given normalized_shape=[512], expected input with shape [*, 512], but got input of size[39, 102, 640]

Henry-E avatar Nov 05 '19 10:11 Henry-E

@Henry-E Just trying to document a bit this, when you have one extra feature of size 12 src_word_vec_size = 500 feat_vec_size = 12 rnn_size = 512 tgt_word_vec_size = 512 share_embeddings= false

The 2 errors you reported are one from the encoder and one from the decoder. This set up is not trivial, and we need to better document and catch misconfiguration before training.

once we take #1564 and #1710 it will be easier.

vince62s avatar May 09 '20 17:05 vince62s

Thanks for following up. Are there any actions that you need from me to recreate this?

Henry-E avatar May 12 '20 10:05 Henry-E

I'm trying to run source features with a transformer, and I'm getting this error:

[2020-06-23 14:48:45,332 INFO]  * src_feat_0 vocab size = 6
[2020-06-23 14:48:45,333 INFO]  * tgt vocab size = 50004
[2020-06-23 14:48:45,333 INFO] Building model...
Traceback (most recent call last):
  File "/home/walsha94/OpenNMT-py/train.py", line 6, in <module>
    main()
  File "/home/walsha94/OpenNMT-py/onmt/bin/train.py", line 209, in main
    train(opt)
  File "/home/walsha94/OpenNMT-py/onmt/bin/train.py", line 91, in train
    single_main(opt, 0)
  File "/home/walsha94/OpenNMT-py/onmt/train_single.py", line 87, in main
    model = build_model(model_opt, opt, fields, checkpoint)
  File "/home/walsha94/OpenNMT-py/onmt/model_builder.py", line 242, in build_model
    model = build_base_model(model_opt, fields, use_gpu(opt), checkpoint)
  File "/home/walsha94/OpenNMT-py/onmt/model_builder.py", line 144, in build_base_model
    src_emb = build_embeddings(model_opt, src_field)
  File "/home/walsha94/OpenNMT-py/onmt/model_builder.py", line 62, in build_embeddings
    fix_word_vecs=fix_word_vecs
  File "/home/walsha94/OpenNMT-py/onmt/modules/embeddings.py", line 199, in __init__
    pe = PositionalEncoding(dropout, self.embedding_size)
  File "/home/walsha94/OpenNMT-py/onmt/modules/embeddings.py", line 25, in __init__
    "odd dim (got dim={:d})".format(dim))
ValueError: Cannot use sin/cos positional encoding with odd dim (got dim=515)

Here are my parameters:

python3 /home/walsha94/OpenNMT-py/train.py -data ../trans/semi_fixed_test_en/EN_transformer_semi_fixed_test_en 
-save_model ../models/trans/semi_fixed_test_en/EN_transformer_semi_fixed_test_en 
-layers 6 -rnn_size 512 -word_vec_size 512 -transformer_ff 2048 -heads 8 
-encoder_type transformer -decoder_type transformer -position_encoding 
-train_steps 200000 -max_generator_batches 2 -dropout 0.1 -batch_size 4096 
-batch_type tokens -normalization tokens -accum_count 2 -optim adam 
-adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 
-max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 
-valid_steps 10000 -save_checkpoint_steps 10000 -world_size 1 -gpu_ranks 0

eihe avatar Jun 23 '20 13:06 eihe

Hi there, With the transformer architecture, you must have word_vec_size + feat_vec_size == rnn_size. Here it can't be the case since rnn_size == word_vec_size, and feat_vec_size will be > 0. The easiest here is probably to set word_vec_size to a lower value, and feat_vec_size to rnn_size - feat_vecs_size. See here for instance: https://forum.opennmt.net/t/does-word-embedding-size-change-when-we-use-word-features/2785

francoishernandez avatar Jun 23 '20 14:06 francoishernandez

Hi @francoishernandez, thanks for the help! I tried what you suggested, and this is the output I got:

python3 /home/walsha94/OpenNMT-py/train.py -data $NAME -save_model models/$NAME -layers 6 
-rnn_size 512 -word_vec_size 500 -feat_vec_size 12 -transformer_ff 2048 -heads 8 
-encoder_type transformer -decoder_type transformer -position_encoding -train_steps 200000  -max_generator_batches 2 -dropout 0.1 -batch_size 4096 -batch_type tokens -normalization tokens  -accum_count 2 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 8000 -learning_rate 2 -max_grad_norm 0 -param_init 0  -param_init_glorot -label_smoothing 0.1 -valid_steps 10000 -save_checkpoint_steps 10000 -world_size 2 -gpu_ranks 0 1

[2020-06-23 15:15:43,945 INFO] Loading dataset from trans/semi_fixed_test_en/EN_transformer_semi_fixed_test_en.train.0.pt
[2020-06-23 15:15:44,133 INFO]  * src vocab size = 50002
[2020-06-23 15:15:44,134 INFO]  * src_feat_0 vocab size = 6
[2020-06-23 15:15:44,134 INFO]  * tgt vocab size = 50004
[2020-06-23 15:15:44,134 INFO] Building model...
/home/walsha94/OpenNMT-py/onmt/modules/embeddings.py:218: UserWarning: Not merging with sum and positive feat_vec_size, but got non-default feat_vec_exponent. It will be unused.
  warnings.warn("Not merging with sum and positive "
[2020-06-23 15:15:53,880 INFO] NMTModel(
  (encoder): TransformerEncoder(
    (embeddings): Embeddings(
      (make_embedding): Sequential(
        (emb_luts): Elementwise(
          (0): Embedding(50002, 500, padding_idx=1)
          (1): Embedding(6, 12, padding_idx=1)
        )
        (pe): PositionalEncoding(
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )

eihe avatar Jun 23 '20 14:06 eihe

@vince62s Are there any plans to include source word features in the new data processing pipeline? They're widely used in many non-NMT applications such as NLG.

Henry-E avatar Dec 11 '20 10:12 Henry-E

Wow, closed as completed!

Henry-E avatar Dec 07 '22 16:12 Henry-E

No @anderleich is working on it, but it may take time. Just cleaning up old issues.

vince62s avatar Dec 07 '22 16:12 vince62s