allennlp
allennlp copied to clipboard
Allow seq2seq models to use multiple decoder layers
Hi there,
I am doing some generation task with allennlp-model
.
I find all the models in seq2seq use LSTMCell for decoding. It makes me hard to use 2(or more) layers of LSTM. But the transformer decoder could be easily extend to more layers.
I wonder whether this setting is meaningful in some situations that I don't know?
@schmmd I compared allennlp with Open-NMT, seem we use the same code in train and valid for generation. This makes hard to use multi-layer decoder. Any idea about solve it ?
@dirkgr @epwalsh Sorry for bother, could you give me some method about how to construct a 2 layer LSTM decoder in allennlp-model ? I tried but failed.
I'm not super familiar with that code, but it seems to me that you'll have to modify LstmCellDecoderNet
to use two LSTMCell
s instead of just one. You can modify previous_state
to carry the output from both cells, either as a concatenated tensor, or separately. Maybe @epwalsh knows more about it?
You can see how we handled this in the semantic parsing code here: https://github.com/allenai/allennlp-semparse/blob/c8bbe4e9fdf4fcb82af4e7c5360e80d51e0898eb/allennlp_semparse/state_machines/transition_functions/basic_transition_function.py#L80-L85. Making a similar change in the seq2seq code would be great, if you want to submit a PR for it.
Hi there, I add 2 new pr for this feature, one for allennlp-models #90 and the other for allennlp #4462 .
The changes in allenlp-models is easy. But I am not sure about my changes for #4462 (beam_search)
The state accepts tensors like (batch, *)
, but the decoder state in multi-layer decoders is (num_layers, batch_size, *)
. So I add a new dimension in these related state.
Cause I am not acquainted with code in beam_search, Could you help me judge these code?@matt-gardner
Hi there,
Just want to tag along with this post since I am currently using copynet_seq2seq and would like to have the multi-layer decoder feature added since I would like to use a multi-layer bidirectional LSTM encoder. Is this functionality on the roadmap? I would assume this is similar to the implementation mentioned above. Thank you!
Hi @shimengfeng, it's not on our immediate roadmap to add this to CopyNet, but I think it would be a great feature to have. Contributions definitely welcome on this.
Awesome, I will give it a try this weekend. Thank you for the confirmation