pointer-generator icon indicating copy to clipboard operation
pointer-generator copied to clipboard

not sure about implementation of copy scatter function

Open jiacheng-xu opened this issue 6 years ago • 5 comments

Correct me if i am wrong.

attn_dists_projected = [tf.scatter_nd(indices, copy_dist, shape) for copy_dist in attn_dists] # list length max_dec_steps (batch_size, extended_vsize) check https://github.com/abisee/pointer-generator/blob/master/model.py#L176

for the scatter_nd function, as https://www.tensorflow.org/api_docs/python/tf/scatter_nd mentioned in the WARNING,

WARNING: The order in which updates are applied is nondeterministic, so the output will be nondeterministic if indices contains duplicates.

So for indices with duplicates, are your model randomly choose one occurrence and use it as final attention? rather than adding the attentions of all occurrences of one word?

If i miss-understand that, beg my pardon.

jiacheng-xu avatar Jan 18 '18 03:01 jiacheng-xu

@jiacheng-xu I think this is what is happening as well i.e. random occurrence among multiple occurrences is selected I am trying to implement this paper in MXNet and can't figure out how to do it the "correct" way (besides prohibitive computation costs). Any clues?

kalpitdixit avatar Feb 02 '18 00:02 kalpitdixit

@kalpitdixit I am implementing it in PyTorch. In pytorch, every previous occurrence is overwritten by following ones. So I preprocess the data, counting and mapping, and then when training or testing, using a matrix transformation to map every occurence to the last occurence. Efficient and accurate. Not sure how MXNet's scatter function behaves though.

jiacheng-xu avatar Feb 02 '18 01:02 jiacheng-xu

Seems according to the paper the attention weights should be sum up. Since I am use cntk and there's no such function, currently I use Times with attention weights and one-hot representation of input.

xiang-deng avatar Feb 02 '18 03:02 xiang-deng

@jiacheng-xu Thanks. What exactly do you mean by "counting and mapping"? I see how somehow transforming all occurrences to the last occurrence solves the problem. I imagine you have to do some jugglery when doing attention...

MXNet's scatter_nd function behaves non-deterministically when multiple occurrences are present. i.e. it will randomly select the value associated with one of the occurrences

@NO-0044 thats a very neat trick.

kalpitdixit avatar Feb 02 '18 05:02 kalpitdixit

image @jiacheng-xu Hi, I don't understand this code, in this code:

  1. indices shape is [batch_size, enc_time, 2]
  2. copy shape is [batch_size, enc_time]
  3. shape is [batch_size, extend_size] can [x, y] is a index of shape?

xiongma avatar Apr 25 '19 05:04 xiongma