sonnet
sonnet copied to clipboard
this portion of attention code looks incorrect
attention_mlp = basic.BatchApply( mlp.MLP([self._mem_size] * self._attention_mlp_layers))
for _ in range(self._num_blocks): attended_memory = self._multihead_attention(memory)
shouldnt it be this
attended_memory = memory
for _ in range(self._num_blocks):
attended_memory = self._multihead_attention(attended_memory)
i know memory isn't changed in that function too, so isn't this expected to be redundant.