lightfm
lightfm copied to clipboard
Getting same recommendation when using user features
Hello!
I'm building a recommendation system, and I'm having a similar problem to Issue #320, and I can't seem to find what is wrong.
Particularly, I trained my model twice: once without using user features and the other time using them. When I don't use them, I get decent recommendations for every user. Many of them are very similar due to the nature of my data (a few items are very popular among users), but overall recommendations look fine.
I would like then to include user features to personalize even more recommendations. However, when I do it, I get almost the same prediction for every user. These predictions look even more similar than when I don't use user features.
The way I'm using user features is the following:
First I give lightfm the user features' names in the dataset fit function:
dataset.fit((data['user_index']), (data['item_index']), user_features=user_features_names)
I then build a user featues list, where the first element is the user id and the second element is a list containing the values of the features for that user:
(user_1, [feature_1, feature_2,..., feature_n])
(user_2, [feature_1, feature_2,..., feature_n])
I build the user features using this as an input for the build_user_features
function and the train and predict with my model.
I am using only categorical variables (one hot encoded) since I got an error when trying to pass continuous variables to lightfm in this way. I also did a random search to find the best hyperparameters to train the model, and find a little improvement but nothing significant (still almost same prediction for every user).
Is there somehting I'm doing wrong? I would really appreciate any help!
Hello did you find any solution? I am having a similar problem
Yes! It was a lot of things, but the main problem seemed to be that I wasnt normalizing my variables properly. Let me know if you need any help!
El lun., 2 mar. 2020 a las 11:20, charc ([email protected]) escribió:
Hello did you find any solution? I am having a similar problem
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lyst/lightfm/issues/497?email_source=notifications&email_token=ANQSMBMO5PRODKIUBCBUTELRFO6B7A5CNFSM4JJCFA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENPPF4I#issuecomment-593425137, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQSMBPB4OPZ5OGHAMJYSVTRFO6B7ANCNFSM4JJCFA2Q .
Thank you for your help.
I have 20 users and 50 items. Each user rated all the items so i divided my data into train and test sets. I am trying to predict the ratings. My training data is as following: userId itemId rating 1 47 4 1 202 3 1 101 5
Here is my model:
n_latent_factors=50 model_DL=[]
User Embeddings user_input = Input(shape=(1,), name='User_Input') user_embeddings = Embedding(input_dim = n_users, output_dim=n_latent_factors, input_length=1, name='User_Embedding') (user_input) user_vector = Flatten(name='User_Vector') (user_embeddings)
Item Embeddings item_input = Input(shape=(1,), name='Item_Input') item_embeddings = Embedding(input_dim = n_items, output_dim=n_latent_factors, input_length=1, name='item_Embedding') (item_input) item_vector = Flatten(name='item_Vector') (item_embeddings)
Concatenate Product merged_vectors = concatenate([user_vector, item_vector], name='Concantenate') dense_layer_1 = Dense(200, activation='relu')(merged_vectors) dense_layer_1 = Dropout(0.1) (dense_layer_1) dense_layer_1 = BatchNormalization()(dense_layer_1) dense_layer_2 = Dense(100, activation='relu')(dense_layer_1) dense_layer_2 = Dropout(0.1) (dense_layer_2) dense_layer_2 = BatchNormalization()(dense_layer_2) dense_layer_3 = Dense(100, activation='relu')(dense_layer_2) dense_layer_3 = Dropout(0.1) (dense_layer_3) dense_layer_4 = BatchNormalization()(dense_layer_3) dense_layer_5 = Dense(50, activation='relu')(dense_layer_4) dense_layer_5 = Dropout(0.1) (dense_layer_5) dense_layer_6 = BatchNormalization()(dense_layer_5) dense_layer_7 = Dense(20, activation='relu')(dense_layer_6) dense_layer_7 = Dropout(0.1) (dense_layer_7) dense_layer_8 = BatchNormalization()(dense_layer_7)
result = Dense(1,activation='relu')(dense_layer_8) model_DL = Model([user_input, item_input], result)
optimizer = Adam(lr = 0.001) model_DL.compile(loss='mean_squared_error', optimizer = optimizer) history = model_DL.fit(x=[x_train['userId'], x_train['itemId']], y=y_train, batch_size= batch_size, epochs=epochs, verbose= 2, validation_data=([x_test['userId'], x_test['itemId']], y_test))
The problem is that the network is predicting the same ratings (for the training set or the testing set) for each user no matter what is the item (except for the first item... i don't know why!). I have also tried to put a 'tanh' activation function on the last layer instead of 'relu', then i scaled the output into 1 and 5. Nothing changed.
Am i missing something? Thank you
Why such a deep architecture with your tiny dataset?
If you want to use Keras, the simplest thing you can do is to use a Dot Layer with the user and item embeddings (with output_dim of the Embedding layers smaller than 20, since the rank of your original matrix couldn't be higher).
Hey @clementechiu , can you describe the solution for using user features to Lightfm prediction? I have the same issue that when I'm fit the model with user features or item features I got almost the same results to all users.
all my features are categorical and in the dataset I add them all the unique values. Thanks a lot.
Hi,
The solution will depend on what your problem is. For my particularly, I had to normalize the variables myself, since LightFM's normalization was not adequate for my data. Also, I remember struggling a lot with the format LightFM requires the input to be given. I ended up using a list, where the first element was the user index and the second one a dictionary of the form {'feature_1': 'value_feature_1',...}. You can see more details in LightFM's documentaion.
Let me know if you have more doubts, or show me some code to see what specifically I can help you with
Hello Everyone, I have a dataset of 6 unique users with 50 unique items. Among these users, two of them have same features. While predicting items for these two users, I get the same result. But when I predict without using user features I get different results. Can someone please help and provide solution on how to tackle this issue.
Yes! It was a lot of things, but the main problem seemed to be that I wasnt normalizing my variables properly. Let me know if you need any help! El lun., 2 mar. 2020 a las 11:20, charc ([email protected]) escribió: … Hello did you find any solution? I am having a similar problem — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#497?email_source=notifications&email_token=ANQSMBMO5PRODKIUBCBUTELRFO6B7A5CNFSM4JJCFA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENPPF4I#issuecomment-593425137>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQSMBPB4OPZ5OGHAMJYSVTRFO6B7ANCNFSM4JJCFA2Q .
Hey, I am facing the similar issue. Could you please explain how you normalized properly?
https://github.com/lyst/lightfm/issues/353 After a long struggle.