arctic-captions
arctic-captions copied to clipboard
Bug in doubly stochastic attention?
It seems like when computing the doubly stochastic attention, the code is doing:
alpha_reg = alpha_c * ((1.-alphas.sum(0))**2).sum(0).mean()
As per my understanding alphas is of dimensions [sequence_length, batch_size, feature_map_spatial_extent] which for vgg conv5 would be 14 x 14 = 196.
This means that we are averaging along the 196 spatial locations as opposed to averaging along with minibatch. Is this the expected behavior?
Any clarification on this would be great!