pointer-generator difference between the pointer network used in nallapati 2016?

difference between the pointer network used in nallapati 2016?

Open AlJohri opened this issue 6 years ago • 0 comments

hi @abisee, I was wondering if you can help me understand the difference between the pointer network used in Nallapati 2016's Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond and your own usage of it.

2.3 Modeling Rare/Unseen Words using Switching Generator-Pointer

Often-times in summarization, the keywords or named-entities in a test document that are central to the summary may actually be unseen or rare with respect to training data. Since the vocabulary of the decoder is fixed at training time, it cannot emit these unseen words. Instead, a most common way of handling these out-of-vocabulary (OOV) words is to emit an ‘UNK’ token as a placeholder. However this does not result in legible summaries. In summarization, an intuitive way to handle such OOV words is to simply point to their location in the source document instead. We model this notion using our novel switching decoder/pointer architecture which is graphically represented in Figure 2. In this model, the decoder is equipped with a ‘switch’ that decides between using the generator or a pointer at every time-step. If the switch is turned on, the decoder produces a word from its target vocabulary in the normal fashion. However, if the switch is turned off, the decoder instead generates a pointer to one of the word-positions in the source. The word at the pointer-location is then copied into the summary. The switch is modeled as a sigmoid activation function over a linear layer based on the entire available context at each timestep as shown below.

source: https://arxiv.org/pdf/1602.06023.pdf

thanks!

Jan 20 '18 21:01 AlJohri

pointer-generator pointer-generator copied to clipboard

difference between the pointer network used in nallapati 2016?

pointer-generator
pointer-generator copied to clipboard