triplet_recommendations_keras Both test loss and validation loss go to 0.5

I modified your code for my problem. I added regularization terms in Embedding layers. But when I train the model, both test loss and val loss go to 0.5. I guess this is because both user and item latent vectors shrink to zero, which makes the bpr_triplet_loss always be 0.5.

Do you have any idea why this happens?

Jun 29 '17 10:06 pntuananh

Maybe you added too much regularization?

Jun 29 '17 10:06 maciejkula

Even when I set regularization to 0, it still happens. You can check my code (sorry for the messy code, I'm still in debugging):

training_data = []
tuning_data   = []
all_users = set()
all_pois  = set()
all_words = set()

user_pois = dd(set)

dim = 50
reg = 0.01
max_words = 200
num_negs = 5
learning_rate = 0.001
num_epochs = 500
batch_size = 256

def load_data():
    print 'loading data...'
    for filename in ['new_training.txt', 'new_tuning.txt']:
        print filename
        for line in open(DIR + filename):
            line = map(int,line.split())
            user = line[0]
            poi = line[1]

            all_users.add(user)
            all_pois.add(poi)

            words = line[2:max_words+2]
            all_words.update(words)

            if filename == 'new_training.txt':
                training_data.append([user,poi,words])
                user_pois[user].add(poi)
            else:
                tuning_data.append([user,poi,words])


load_data()

num_users = max(all_users)
num_pois  = max(all_pois)
num_words = max(all_words)

all_users = list(all_users)
all_pois  = list(all_pois)
all_words = list(all_words)

def identity_loss(y_true, y_pred):
    return K.mean(y_pred - 0 * y_true)


def bpr_triplet_loss(X):
    user_latent, pos_poi_latent, neg_poi_latent = X

    # BPR loss
    loss = 1.0 - K.sigmoid(
        K.sum(user_latent * (pos_poi_latent - neg_poi_latent), axis=-1, keepdims=True))

    return loss


def build_model():
    user_input = Input(shape=(1,), dtype='int32', name='user_input')
    pos_poi_input  = Input(shape=(1,), dtype='int32', name='pos_poi_input')
    neg_poi_input  = Input(shape=(1,), dtype='int32', name='neg_poi_input')

    # Embedding layer
    Embedding_user = Embedding(input_dim = num_users+1, output_dim = dim, name = 'user_embedding',
                                  embeddings_initializer = RandomUniform(), embeddings_regularizer = l2(reg), input_length=1)

    item_embedding_layer = Embedding(input_dim = num_pois+1, output_dim = dim, name='item_embedding',
                                  embeddings_initializer = RandomUniform(), embeddings_regularizer = l2(reg), input_length=1)

    
    user_latent = Embedding_user(user_input)
    user_latent = Flatten()(user_latent)

    pos_poi_latent = item_embedding_layer(pos_poi_input)
    pos_poi_latent = Flatten()(pos_poi_latent)

    neg_poi_latent = item_embedding_layer(neg_poi_input)
    neg_poi_latent = Flatten()(neg_poi_latent)

    loss = merge([user_latent, pos_poi_latent, neg_poi_latent], mode=bpr_triplet_loss, output_shape=(1,))
    # prediction = Dense(1, activation='sigmoid', name='prediction')(predict_score)

    model = Model(input=[user_input, pos_poi_input, neg_poi_input], output=loss)

    return model


training_idx = range(len(training_data))*num_negs
np.random.shuffle(training_idx)

mini_batch_size = 200000
start_idx = 0
end_idx = mini_batch_size

def get_training_instances():
    global start_idx, end_idx

    user_input, pos_poi_input, neg_poi_input = [], [], []

    if start_idx >= len(training_idx):
        np.random.shuffle(training_idx)
        start_idx = 0
        end_idx = mini_batch_size

    for samp in xrange(start_idx, min(end_idx, len(training_idx))):
        idx = training_idx[samp]
        user, poi, words = training_data[idx]

        user_input.append(user)
        pos_poi_input.append(poi)

        while True:
            neg_poi = all_pois[np.random.randint(len(all_pois))]
            if neg_poi not in user_pois[user]:
                break

        neg_poi_input.append(neg_poi)

    start_idx = end_idx
    end_idx += mini_batch_size

    return user_input, pos_poi_input, neg_poi_input


def get_validation_set():
    user_input, pos_poi_input, neg_poi_input = [],[],[]
    for user, pos_poi, words in tuning_data:

        user_input.append(user)
        pos_poi_input.append(pos_poi)

        while True:
            neg_poi = all_pois[np.random.randint(len(all_pois))]
            if neg_poi not in user_pois[user]:
                break

        neg_poi_input.append(neg_poi)

    return user_input, pos_poi_input, neg_poi_input 

model = build_model()

optimizer = SGD(lr=learning_rate)
model.compile(optimizer=optimizer, loss=identity_loss)

val_user_input, val_pos_poi_input, val_neg_poi_input = get_validation_set()

x_validate = [np.array(val_user_input), np.array(val_pos_poi_input), np.array(val_neg_poi_input)]
y_validate = np.ones(len(val_user_input))

for epoch in xrange(num_epochs):

    user_input, pos_poi_input, neg_poi_input = get_training_instances()

    x_training = [np.array(user_input), np.array(pos_poi_input), np.array(neg_poi_input)] 
    y_training = np.ones(len(user_input))

    model.fit(x_training, y_training,
              validation_data = (x_validate, y_validate),
              batch_size=batch_size, nb_epoch=1, shuffle=True
             )

model.save(DIR + 'model_test.txt')

Jun 29 '17 11:06 pntuananh

It's hard to say what's going wrong. Are you sure you set regularization to 0.0? In the code above it's set to 0.01.

I would spend some time refactoring the code, it would be easier for you to understand what's actually happening.

Jun 29 '17 20:06 maciejkula

Even I set regularization to 0.0, this still happens.

What is more strange is that, I change the model to simple matrix factorization (using dot product between user and item latent factors to fit the matrix), the issue still happens. Then when I try fit only nonzero values (which are 1), it still happens. The training loss will go to 1.0, because all the dot products are 0 (I use 'mse' as the loss metric).

I think there should be wrong somewhere in my code.

Jun 30 '17 02:06 pntuananh

I think I know the problem. The problem is that the Embedding layers of current keras are broken with regularization!!

From above link, they say normally we don't use regularization for embeddings. But I really don't understand that why the model is not over-fitting without regularization? Are there any constraints that make the model avoid over-fitting?

Now, we have two choices: 1) do not use any regularization at all, 2) assign updating weights for each user and item based on its frequency, as the suggestion from the above link, 3) use embeddings_constraint. I have tested option 3 in the simple matrix factorization model, it produces better validation loss, although training loss is worse. I will test this option in your model later.

Jul 01 '17 07:07 pntuananh

Updating every embedding weight after every minibatch when a L2 penalty is applied is the right behaviour. It is perfectly fine to use L2 regularization, but the penalty needs to be a small enough value.

In your case, I would:

Try to refactor the code. You have functions that close over and modify variables defined in the outer scope, for one: I worry that you are making mistakes you (or I) can't see because your training code is not clear.
You are running a huge number of epochs.
Try a different optimizer: Adam or Adagrad.
Try a larger learning rate.

The good news is that there is an issue with your code or hyperparameters: the framework is fine.

Jul 01 '17 08:07 maciejkula

triplet_recommendations_keras triplet_recommendations_keras copied to clipboard

Both test loss and validation loss go to 0.5

triplet_recommendations_keras
triplet_recommendations_keras copied to clipboard