allennlp icon indicating copy to clipboard operation
allennlp copied to clipboard

Allow Copynet models to use multiple decoder layers

Open shimengfeng opened this issue 5 years ago • 13 comments

Is your feature request related to a problem? Please describe. Current Copynet architecture only allows one LSTM decoder layer. This similar to #4451

Describe the solution you'd like Give users the choice of having multiple decoder layers to have a more comprehensive copynet encoder-decoder architecture.

Describe alternatives you've considered N/A

Additional context N/A

shimengfeng avatar Nov 01 '20 23:11 shimengfeng

This would be great!

JohnGiorgi avatar Nov 13 '20 21:11 JohnGiorgi

Hi all, I am planning on working on this and will update the issue again when a PR is ready for review. Thanks!

annajung avatar Mar 30 '21 16:03 annajung

Hi @JohnGiorgi I was wondering if you could help undertstand the copy score functionality changes when there is more than one decoder layer. Would it be correct to grab the last layer from state["decoder_hidden"]?

https://github.com/allenai/allennlp-models/blob/abc7e1991127540af4a9cd3f7d49129401ae47b9/allennlp_models/generation/models/copynet_seq2seq.py#L357

annajung avatar Mar 31 '21 14:03 annajung

@annajung Sorry I am getting back so late! Are the decoder layers all RNNs? If so I think it makes the most sense to use the last hidden vector output from the last decoder layer.

If the decoder can be something else (like a transformer), I think the answer is a little more complicated.

JohnGiorgi avatar Apr 07 '21 15:04 JohnGiorgi

No worries! @JohnGiorgi, @epwalsh gave the same feedback on the open PR. PTAL when you get a chance :) https://github.com/allenai/allennlp-models/pull/245

annajung avatar Apr 07 '21 15:04 annajung

@annajung Looks good :) I left a comment!

I think I originally thought the goal was to add a composable decoder to the copynet model, similar to what already exists for the ComposedSeq2Seq so that you could easily experiment with the different decoders implemented in AllenNLP.

But re-reading #4451 and #4766 it seems I might have understood, and what I was thinking is probably tricker to implement.

JohnGiorgi avatar Apr 07 '21 16:04 JohnGiorgi

oh interesting! Do you still want to merge the open PR or look into a composable decoder?

annajung avatar Apr 07 '21 18:04 annajung

@JohnGiorgi @annajung I like this idea of adding a composable decoder to CopyNet, which could also support multi-layer LSTMs. If you either of you would like to look into this, I think we should hold off on https://github.com/allenai/allennlp-models/pull/245.

epwalsh avatar Apr 07 '21 18:04 epwalsh

@annajung What do you think? I would be game to try and implement it. I am excited to try out a transformer decoder layer within copynet for my own project. I should be able to start next week!

JohnGiorgi avatar Apr 07 '21 18:04 JohnGiorgi

sure that works, no worries! I can close my PR in favor of a composable decoder. thanks both for the feedback!

annajung avatar Apr 07 '21 19:04 annajung

Okay I have made a decent amount of progress on this. I have replaced self._decoder_cell with self.decoder, which is a SeqDecoder subclass. Now I am trying to figure out how to replace the following:

https://github.com/allenai/allennlp-models/blob/f1de60fc5eab482dfd58d47ca160b497c69b4519/allennlp_models/generation/models/copynet_seq2seq.py#L339-L342

state["decoder_hidden"], state["decoder_context"] = self._decoder_cell(
    projected_decoder_input.float(),
    (state["decoder_hidden"].float(), state["decoder_context"].float()),
)

It looks like I will have to drop down to the forward method of self._decoder._decoder_net, something like:

new_state, state["decoder_hidden"] = self._decoder._decoder_net(
    previous_state=state,
    encoder_outputs=state["encoder_outputs"],
    source_mask=state["source_mask"],
    previous_steps_predictions=last_predictions
)
state["decoder_context"] = new_state.get("decoder_context", None)

I am unsure about this approach for a couple of reasons. Partly because it looks like there are differences in how each decoder (LstmCellDecoderNet, StackedSelfAttentionDecoderNet) implements this function, but also because I don't know how projected_decoder_input fits in.

Any help is welcome! @epwalsh @annajung

JohnGiorgi avatar Apr 09 '21 20:04 JohnGiorgi

@JohnGiorgi one thing to keep in mind is that we want to keep backwards compatibility. So a CopyNet model trained before with a _decoder_cell should still work.

And if the DecoderNet API doesn't really fit, then I think it's okay to create a new abstraction that works better here. Something like CopyNetDecoder.

epwalsh avatar Apr 12 '21 22:04 epwalsh

@JohnGiorgi one thing to keep in mind is that we want to keep backwards compatibility. So a CopyNet model trained before with a _decoder_cell should still work.

I see. So the main considerations here are

  1. The "old" CopyNet configs can be loaded in the "new" CopyNet.
  2. the state_dict of the old CopyNet can be loaded into this new CopyNet. I think the only place the state_dicts would differ is that the old CopyNet will have these parameters, and the new CopyNet wont:
0 | 2021-04-13 06:57:14,602 - INFO - allennlp.common.util - _decoder_cell.weight_ih
0 | 2021-04-13 06:57:14,602 - INFO - allennlp.common.util - _decoder_cell.weight_hh
0 | 2021-04-13 06:57:14,602 - INFO - allennlp.common.util - _decoder_cell.bias_ih
0 | 2021-04-13 06:57:14,603 - INFO - allennlp.common.util - _decoder_cell.bias_hh

One solution I can think of:

  • keep the current arguments of the old CopyNet (e.g. attention, beam_size, etc) so old configs can be loaded
  • add a new, optional argument, probably called decoder (like ComposedSeq2Seq) that is a DecoderNet.
  • if decoder is not None, init it with the params passed to CopyNet (e.g. attention, beam_size, etc).
  • otherwise, init an LSTMCell as is currently done.

I think this would achieve backwards compatibility. Let me know if you spot a mistake or have a better solution, though!

And if the DecoderNet API doesn't really fit, then I think it's okay to create a new abstraction that works better here. Something like CopyNetDecoder.

Okay, I think we may have to (or do some subclassing). I don't see a way for me to use the two currently implemented DecoderNets without modification. Will work on this.

JohnGiorgi avatar Apr 13 '21 14:04 JohnGiorgi