allennlp icon indicating copy to clipboard operation
allennlp copied to clipboard

Allow seq2seq models to use multiple decoder layers

Open wlhgtc opened this issue 4 years ago • 8 comments

Hi there, I am doing some generation task with allennlp-model.

I find all the models in seq2seq use LSTMCell for decoding. It makes me hard to use 2(or more) layers of LSTM. But the transformer decoder could be easily extend to more layers.

I wonder whether this setting is meaningful in some situations that I don't know?

wlhgtc avatar Jul 08 '20 08:07 wlhgtc

@schmmd I compared allennlp with Open-NMT, seem we use the same code in train and valid for generation. This makes hard to use multi-layer decoder. Any idea about solve it ?

wlhgtc avatar Jul 10 '20 08:07 wlhgtc

@dirkgr @epwalsh Sorry for bother, could you give me some method about how to construct a 2 layer LSTM decoder in allennlp-model ? I tried but failed.

wlhgtc avatar Jul 10 '20 11:07 wlhgtc

I'm not super familiar with that code, but it seems to me that you'll have to modify LstmCellDecoderNet to use two LSTMCells instead of just one. You can modify previous_state to carry the output from both cells, either as a concatenated tensor, or separately. Maybe @epwalsh knows more about it?

dirkgr avatar Jul 10 '20 12:07 dirkgr

You can see how we handled this in the semantic parsing code here: https://github.com/allenai/allennlp-semparse/blob/c8bbe4e9fdf4fcb82af4e7c5360e80d51e0898eb/allennlp_semparse/state_machines/transition_functions/basic_transition_function.py#L80-L85. Making a similar change in the seq2seq code would be great, if you want to submit a PR for it.

matt-gardner avatar Jul 10 '20 16:07 matt-gardner

Hi there, I add 2 new pr for this feature, one for allennlp-models #90 and the other for allennlp #4462 .

The changes in allenlp-models is easy. But I am not sure about my changes for #4462 (beam_search) The state accepts tensors like (batch, *), but the decoder state in multi-layer decoders is (num_layers, batch_size, *). So I add a new dimension in these related state. Cause I am not acquainted with code in beam_search, Could you help me judge these code?@matt-gardner

wlhgtc avatar Jul 12 '20 03:07 wlhgtc

Hi there,

Just want to tag along with this post since I am currently using copynet_seq2seq and would like to have the multi-layer decoder feature added since I would like to use a multi-layer bidirectional LSTM encoder. Is this functionality on the roadmap? I would assume this is similar to the implementation mentioned above. Thank you!

shimengfeng avatar Oct 30 '20 13:10 shimengfeng

Hi @shimengfeng, it's not on our immediate roadmap to add this to CopyNet, but I think it would be a great feature to have. Contributions definitely welcome on this.

epwalsh avatar Oct 30 '20 16:10 epwalsh

Awesome, I will give it a try this weekend. Thank you for the confirmation

shimengfeng avatar Oct 30 '20 19:10 shimengfeng