vecalign
vecalign copied to clipboard
Why remove global mean when halving vectors?
Thanks for sharing the excellent source code. I am confused about the vector half function:
def downsample_vectors(vecs1):
a, b, c = vecs1.shape
half = np.empty((a, b // 2, c), dtype=np.float32)
for ii in range(a):
# average consecutive vectors
for jj in range(0, b - b % 2, 2):
v1 = vecs1[ii, jj, :]
v2 = vecs1[ii, jj + 1, :]
half[ii, jj // 2, :] = v1 + v2
# compute mean for all vectors
mean = np.mean(half[ii, :, :], axis=0)
for jj in range(0, b - b % 2, 2):
# remove mean
half[ii, jj // 2, :] = half[ii, jj // 2, :] - mean
# make vectors norm==1 so dot product is cosine distance
make_norm1(half)
return half
Why do you remove the global mean along the first axis instead of simply dividing the vector by 2? Is there any reason why you do this? It would be very helpful if you can share your motivation and insights on this.
I'm also currently trying to understand vecAlign...
My best guess so far is the explanation from the paper: "... We also find vectors for large blocks of sentences become correlated with each other, so we center them around zero vector."
After a bit of arithmetic, it turns out that the mean of the modified vectors is 0 (zero vector).
As to the question of why not divide by 2, I suspect that it is because the cos distance only calculates an angle and dividing by 2 would only shorten the vectors but not change the angle. It can therefore be omitted.