Abi See
Abi See
@Spandan-Madan looking at the smoothed loss curve according to Tensorboard, the training loss was about 2.8 after about 230k iterations, before we turned on coverage.
@Spandan-Madan "coverage" is one of the main ideas of the paper. See also [these flags](https://github.com/abisee/pointer-generator/blob/master/run_summarization.py#L63).
You can't (easily) increase those things. You've been training a model that is performing matrix transformations on hidden vectors of size 64. You can't use that model to handle hidden...
Not sure I understand the question. `max_dec_steps` refers to the maximum number of steps we will run the decoder RNN. This is the same thing as "max number of abstract...
You're right, the comments were wrong. Now [fixed](https://github.com/abisee/pointer-generator/commit/0cdcaeeaf8f42d4d64ec2ed09eb2f0158cd0db8f).
Yes, the coverage vector **c** increases monotonically and is unbounded. It is possible for **sum(min(a,c))** to be 1 forever if the attention **a** always attends to something that's already been...
@Gandor26 For intuition see the [blog post](http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html).
That padding code is for the decoder inputs and targets during _training_, not test-time decoding. During decoding, the decoder is run one step at a time with beam search, and...
Yes, the pointer-generator model is able to produce UNK tokens during decoding. UNK is part of the vocabulary object and the pointer-generator decoder has access to the whole vocabulary.
1. We very rarely / perhaps never see UNKs in the output of the pointer-generator model. However, this is mostly because at test time, the pointer-generator model acts in pointing...