Rowanhart_bubu
Rowanhart_bubu
I have the same problem, and the grad_norm on one of workers is 0. I don't why until now. Thanks for your solution.
I encountered totally the same error, does anyone has a good solution? Thanks!
i want the preprocessing scripts too, because i trained a model on training data divided by myself, and got a worse result than author's.
> The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view > I used this simple script to convert the format of...
Because encoder just extracts information from source language sentence, but when decoder wants to generate a target language sentence, it needs an signal to tell itself to start decoding with...
because the original implementation of paper uses the same embedding matrix in encoder and decoder, > We also use the usual learned linear transformation and softmax function to convert the...
> line1: nonpadding是用来标记``位置的矩阵。``的对应位置为0. ,其余位置为1. > line2: 先用ce*nonpadding,将``位置的交叉熵置为0,再求个 reduce_sum,算出总损失。 > 至于最后一点 `/ (tf.reduce_sum(nonpadding) + 1e-7)` 我也不知道是为什么。 最后是计算所有的非padding位置的数目,也就是有效位的数目,用总损失除以所有位置的数目,就是token-level的loss
I run this script on Ubuntu, and it seems there is still something wrong with it. And then, i replace all `open` with `codecs.open(filename, "w"/"r", "utf-8")`, it works well.
hello, guys.I mean , do you solve this? i MEET THE SAME problem..
I have the same problem, hope someone help us , maybe it is related to tensorflow version?