vecalign icon indicating copy to clipboard operation
vecalign copied to clipboard

Why remove global mean when halving vectors?

Open Brucewuzhang opened this issue 3 years ago • 1 comments

Thanks for sharing the excellent source code. I am confused about the vector half function:

def downsample_vectors(vecs1):
    a, b, c = vecs1.shape
    half = np.empty((a, b // 2, c), dtype=np.float32)
    for ii in range(a):
        # average consecutive vectors
        for jj in range(0, b - b % 2, 2):
            v1 = vecs1[ii, jj, :]
            v2 = vecs1[ii, jj + 1, :]
            half[ii, jj // 2, :] = v1 + v2
        # compute mean for all vectors
        mean = np.mean(half[ii, :, :], axis=0)
        for jj in range(0, b - b % 2, 2):
            # remove mean
            half[ii, jj // 2, :] = half[ii, jj // 2, :] - mean
    # make vectors norm==1 so dot product is cosine distance
    make_norm1(half)
    return half

Why do you remove the global mean along the first axis instead of simply dividing the vector by 2? Is there any reason why you do this? It would be very helpful if you can share your motivation and insights on this.

Brucewuzhang avatar Apr 16 '21 02:04 Brucewuzhang

I'm also currently trying to understand vecAlign...

My best guess so far is the explanation from the paper: "... We also find vectors for large blocks of sentences become correlated with each other, so we center them around zero vector."

After a bit of arithmetic, it turns out that the mean of the modified vectors is 0 (zero vector).

As to the question of why not divide by 2, I suspect that it is because the cos distance only calculates an angle and dividing by 2 would only shorten the vectors but not change the angle. It can therefore be omitted.

janisdd avatar Apr 25 '24 14:04 janisdd