CS224N-2019
CS224N-2019 copied to clipboard
Y_t, o_pre concat order issure ?
Nice work! The version of implementation can reach 22+ BLUE score. However, my implementation have only 0.16+ BLUE score on test dataset. Comparing with your work, I found changing the concatenation torch.cat((Y_t, o_pre), dim=1) to torch.cat((o_pre, Y_t), dim=1) can only reach 0.16+ BLUE score.
Would you like share your ideas why concatenating Y_t and o_pre in such way?
Thank you!
Thank you for your kind words ! In my opinion, I think the order of features in that case doesn't affect the performance of model. With my implementation, imagine that (Y_t, o_prev) has corresponding weights (W_y, W_o) Then after finished training, if I change the order into (o_prev, Y_t), and also changing the order of weights into (W_o, W_y), then the output are the same : Y_t * W_y + o_prev * W_o = o_prev * W_o + Y_t * W_y But if you use your order and train from start, I think your model has different performance just because that order (which affect the initial weight corresponding to (o_prev, Y_t)) doesn't work well with the default random seed. You can try training a little longer, or set a different random seed and tell me your BLEU score that you have ! :muscle: