triplet_recommendations_keras
triplet_recommendations_keras copied to clipboard
Both test loss and validation loss go to 0.5
I modified your code for my problem. I added regularization terms in Embedding layers. But when I train the model, both test loss and val loss go to 0.5. I guess this is because both user and item latent vectors shrink to zero, which makes the bpr_triplet_loss always be 0.5.
Do you have any idea why this happens?
Maybe you added too much regularization?
Even when I set regularization to 0, it still happens. You can check my code (sorry for the messy code, I'm still in debugging):
training_data = []
tuning_data = []
all_users = set()
all_pois = set()
all_words = set()
user_pois = dd(set)
dim = 50
reg = 0.01
max_words = 200
num_negs = 5
learning_rate = 0.001
num_epochs = 500
batch_size = 256
def load_data():
print 'loading data...'
for filename in ['new_training.txt', 'new_tuning.txt']:
print filename
for line in open(DIR + filename):
line = map(int,line.split())
user = line[0]
poi = line[1]
all_users.add(user)
all_pois.add(poi)
words = line[2:max_words+2]
all_words.update(words)
if filename == 'new_training.txt':
training_data.append([user,poi,words])
user_pois[user].add(poi)
else:
tuning_data.append([user,poi,words])
load_data()
num_users = max(all_users)
num_pois = max(all_pois)
num_words = max(all_words)
all_users = list(all_users)
all_pois = list(all_pois)
all_words = list(all_words)
def identity_loss(y_true, y_pred):
return K.mean(y_pred - 0 * y_true)
def bpr_triplet_loss(X):
user_latent, pos_poi_latent, neg_poi_latent = X
# BPR loss
loss = 1.0 - K.sigmoid(
K.sum(user_latent * (pos_poi_latent - neg_poi_latent), axis=-1, keepdims=True))
return loss
def build_model():
user_input = Input(shape=(1,), dtype='int32', name='user_input')
pos_poi_input = Input(shape=(1,), dtype='int32', name='pos_poi_input')
neg_poi_input = Input(shape=(1,), dtype='int32', name='neg_poi_input')
# Embedding layer
Embedding_user = Embedding(input_dim = num_users+1, output_dim = dim, name = 'user_embedding',
embeddings_initializer = RandomUniform(), embeddings_regularizer = l2(reg), input_length=1)
item_embedding_layer = Embedding(input_dim = num_pois+1, output_dim = dim, name='item_embedding',
embeddings_initializer = RandomUniform(), embeddings_regularizer = l2(reg), input_length=1)
user_latent = Embedding_user(user_input)
user_latent = Flatten()(user_latent)
pos_poi_latent = item_embedding_layer(pos_poi_input)
pos_poi_latent = Flatten()(pos_poi_latent)
neg_poi_latent = item_embedding_layer(neg_poi_input)
neg_poi_latent = Flatten()(neg_poi_latent)
loss = merge([user_latent, pos_poi_latent, neg_poi_latent], mode=bpr_triplet_loss, output_shape=(1,))
# prediction = Dense(1, activation='sigmoid', name='prediction')(predict_score)
model = Model(input=[user_input, pos_poi_input, neg_poi_input], output=loss)
return model
training_idx = range(len(training_data))*num_negs
np.random.shuffle(training_idx)
mini_batch_size = 200000
start_idx = 0
end_idx = mini_batch_size
def get_training_instances():
global start_idx, end_idx
user_input, pos_poi_input, neg_poi_input = [], [], []
if start_idx >= len(training_idx):
np.random.shuffle(training_idx)
start_idx = 0
end_idx = mini_batch_size
for samp in xrange(start_idx, min(end_idx, len(training_idx))):
idx = training_idx[samp]
user, poi, words = training_data[idx]
user_input.append(user)
pos_poi_input.append(poi)
while True:
neg_poi = all_pois[np.random.randint(len(all_pois))]
if neg_poi not in user_pois[user]:
break
neg_poi_input.append(neg_poi)
start_idx = end_idx
end_idx += mini_batch_size
return user_input, pos_poi_input, neg_poi_input
def get_validation_set():
user_input, pos_poi_input, neg_poi_input = [],[],[]
for user, pos_poi, words in tuning_data:
user_input.append(user)
pos_poi_input.append(pos_poi)
while True:
neg_poi = all_pois[np.random.randint(len(all_pois))]
if neg_poi not in user_pois[user]:
break
neg_poi_input.append(neg_poi)
return user_input, pos_poi_input, neg_poi_input
model = build_model()
optimizer = SGD(lr=learning_rate)
model.compile(optimizer=optimizer, loss=identity_loss)
val_user_input, val_pos_poi_input, val_neg_poi_input = get_validation_set()
x_validate = [np.array(val_user_input), np.array(val_pos_poi_input), np.array(val_neg_poi_input)]
y_validate = np.ones(len(val_user_input))
for epoch in xrange(num_epochs):
user_input, pos_poi_input, neg_poi_input = get_training_instances()
x_training = [np.array(user_input), np.array(pos_poi_input), np.array(neg_poi_input)]
y_training = np.ones(len(user_input))
model.fit(x_training, y_training,
validation_data = (x_validate, y_validate),
batch_size=batch_size, nb_epoch=1, shuffle=True
)
model.save(DIR + 'model_test.txt')
It's hard to say what's going wrong. Are you sure you set regularization to 0.0? In the code above it's set to 0.01.
I would spend some time refactoring the code, it would be easier for you to understand what's actually happening.
Even I set regularization to 0.0, this still happens.
What is more strange is that, I change the model to simple matrix factorization (using dot product between user and item latent factors to fit the matrix), the issue still happens. Then when I try fit only nonzero values (which are 1), it still happens. The training loss will go to 1.0, because all the dot products are 0 (I use 'mse' as the loss metric).
I think there should be wrong somewhere in my code.
I think I know the problem. The problem is that the Embedding layers of current keras are broken with regularization!!
From above link, they say normally we don't use regularization for embeddings. But I really don't understand that why the model is not over-fitting without regularization? Are there any constraints that make the model avoid over-fitting?
Now, we have two choices: 1) do not use any regularization at all, 2) assign updating weights for each user and item based on its frequency, as the suggestion from the above link, 3) use embeddings_constraint. I have tested option 3 in the simple matrix factorization model, it produces better validation loss, although training loss is worse. I will test this option in your model later.
Updating every embedding weight after every minibatch when a L2 penalty is applied is the right behaviour. It is perfectly fine to use L2 regularization, but the penalty needs to be a small enough value.
In your case, I would:
- Try to refactor the code. You have functions that close over and modify variables defined in the outer scope, for one: I worry that you are making mistakes you (or I) can't see because your training code is not clear.
- You are running a huge number of epochs.
- Try a different optimizer: Adam or Adagrad.
- Try a larger learning rate.
The good news is that there is an issue with your code or hyperparameters: the framework is fine.