recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

RemoveAccidentalHits Not Work Correctly

Open mustfkeskin opened this issue 1 year ago • 8 comments

First i will explain code I took the code here and analyzed it. class RemoveAccidentalHits(tf.keras.layers.Layer)

import tensorflow as tf

# MIN_FLOAT 
MIN_FLOAT = 1e-8

# Örnek veri oluştur
batch_size = 3
num_candidates = 4

# Örnek labels (one-hot encoded)
labels = tf.constant([
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0]
], dtype=tf.float32)

# Örnek logits
logits = tf.random.normal(shape=(batch_size, num_candidates))

# Örnek candidate_ids
candidate_ids = tf.constant(["101", "102", "101", "104"], dtype=tf.string)

# Create an instance of the RemoveAccidentalHits layer
remove_accidental_hits_layer = RemoveAccidentalHits()

# Modify logits using the custom layer
modified_logits = remove_accidental_hits_layer(labels, logits, candidate_ids)

# Print the original and modified logits
print("Original Logits:")
print(logits.numpy())
print("\nModified Logits:")
print(modified_logits.numpy())

Output layer. Nothing change here but i expected to logits[0, 2] and logits[2,0] set to min_float

Original Logits:
[[ 0.82528496 -1.3708178   0.21487834 -0.06206743]
 [-0.79802114  1.9325752   1.1109723   0.963176  ]
 [ 1.3992568  -0.7718229   2.2503998   0.77576673]]

Modified Logits:
[[ 0.82528496 -1.3708178   0.21487835 -0.06206743]
 [-0.79802114  1.9325752   1.1109723   0.963176  ]
 [ 1.3992568  -0.7718229   2.2503998   0.77576673]]

The problem here is that the information set as duplicate is intended to be assigned a very small value, but there is a small error here. logits + duplicate * MIN_FLOAT When we assign in this way, the values will remain very similar. It is necessary to update the code here as I did.

As far as I understand, duplicate variable works like a mask layer. This works correctly.

duplicate = duplicate - labels
duplicate
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[0., 0., 1., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.]], dtype=float32)>

My sugessiton here. First we need take inverse duplicate matrix.

inverse_duplicate = tf.where(tf.equal(duplicate, 1), 0., 1.)
inverse_duplicate
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[1., 1., 0., 1.],
       [1., 1., 1., 1.],
       [0., 1., 1., 1.]], dtype=float32)>

Then we need to convert last line like this

logits * inverse_duplicate + MIN_FLOAT
array([[ 8.2528496e-01, -1.3708178e+00,  9.9999999e-09, -6.2067419e-02],
       [-7.9802114e-01,  1.9325752e+00,  1.1109723e+00,  9.6317601e-01],
       [ 9.9999999e-09, -7.7182293e-01,  2.2503998e+00,  7.7576673e-01]],
      dtype=float32)

mustfkeskin avatar Jan 16 '24 08:01 mustfkeskin

If this is true, this will explain a weird behavior that I used to see while decreasing the batch size, i get a better model in general (the reason might be that I have less opportunity to have accidental hits), but I have a question, I assume that usually we have a square matrix that gets input to the accidental hit layer, why you're assuming different number in batch size and candidate ids?

OmarMAmin avatar Jan 29 '24 18:01 OmarMAmin

When training the model, we only know the embeddings from query and candidate towers. But we do not know which userid or productid these embeddings belong to. Let's assume that the 5th line and the 12th line are the same products. The model only takes the diagonal information in the square matrix formed after the dot product as positive and accepts all other information as negative. But the 5th and 12th lines are the same productid. If we do not feed this information, the same productid itself will enter the model as both a positive and negative sample. Therefore, we need to set accidental hits. But if the dataset is shuffled, the probability of such examples coming will decrease.

mustfkeskin avatar Jan 30 '24 06:01 mustfkeskin

I understand that, my understanding is that we end up with a square matrix with logits, the example you used for the accidental hits has 4 candidates and 3 queries, wondering if that may be the reason (highly unlikely)

for a batch we usually have a set of queries and candidates of the same size (regardless of having accidental hits or not)

We will only have more candidates compared to queries if we added another uniform negative samples, then we have more candidates than queries (which accidental hits layer should also work well in that case)

Are you using the latest version?

OmarMAmin avatar Jan 30 '24 11:01 OmarMAmin

This is what i got when executing your code

image

OmarMAmin avatar Jan 30 '24 11:01 OmarMAmin

I guess it's currently working correctly right? it makes the softmax for this value almost 0

OmarMAmin avatar Jan 30 '24 11:01 OmarMAmin

I believe this need to be fixed as you mentioned, I tried a different batching (supposedly should work better) but was performing worse and one explanation would be that the remove_accidental_hits is not working correctly

OmarMAmin avatar May 31 '24 14:05 OmarMAmin

Can you please check this @maciejkula ?

I tried to create batches from similar products (to make it harder for the model), and these batches will have many duplicate classes, and I saw a performance drop, I wasn't sure why this is happening but this can explain it, can you please confirm if it's a bug

OmarMAmin avatar May 31 '24 14:05 OmarMAmin

I checked the test case for that layer, and It didn't make sense to me

I tried to run the test script here

shape = (2, 4)
rng = np.random.RandomState(10)

logits = rng.uniform(size=shape).astype(np.float32)
labels = rng.permutation(np.eye(*shape).T).T.astype(np.float32)
candidate_ids = rng.randint(0, 3, size=shape[-1])

out_logits = loss.RemoveAccidentalHits()(
    labels, logits, candidate_ids).numpy()

And i got the following values

candidate_ids [2 0 0 0]
logits
[[0.77132064 0.02075195 0.6336482  0.74880385]
 [0.49850702 0.22479665 0.19806287 0.7605307 ]]
modified_logits
[[ 7.7132064e-01 -3.4028236e+36 -3.4028236e+36  7.4880385e-01]
 [ 4.9850702e-01  2.2479665e-01  1.9806287e-01  7.6053071e-01]]

I thought for these values I'd expect that the second row will be the only one that gets modified to change the logits (as the last three candidates are the same), but as you can see here the modified values are in (0,1), (0,2) which doesn't make any sense, the modified values should be at (1,2) and (1,3)

OmarMAmin avatar May 31 '24 15:05 OmarMAmin