Grokking-Deep-Learning
Grokking-Deep-Learning copied to clipboard
Weight updates in rnn
In Chapter 12, why are we giving a previous layer delta to embedded layer
embed_idx = sent[layer_idx]
embed[embed_idx] -= layers[layer_idx]['hidden_delta'] * alpha / float(len(sent))
shouldn't it be
embed[embed_idx] -= layer['hidden_delta'] * alpha / float(len(sent))
Also what is the need of updating embeddings of last word in the sentence which is being predicted by the network ?