lightfm icon indicating copy to clipboard operation
lightfm copied to clipboard

Getting same recommendation when using user features

Open clementechiu opened this issue 4 years ago • 9 comments

Hello!

I'm building a recommendation system, and I'm having a similar problem to Issue #320, and I can't seem to find what is wrong.

Particularly, I trained my model twice: once without using user features and the other time using them. When I don't use them, I get decent recommendations for every user. Many of them are very similar due to the nature of my data (a few items are very popular among users), but overall recommendations look fine.

I would like then to include user features to personalize even more recommendations. However, when I do it, I get almost the same prediction for every user. These predictions look even more similar than when I don't use user features.

The way I'm using user features is the following:

First I give lightfm the user features' names in the dataset fit function:

dataset.fit((data['user_index']), (data['item_index']), user_features=user_features_names)

I then build a user featues list, where the first element is the user id and the second element is a list containing the values of the features for that user:

(user_1, [feature_1, feature_2,..., feature_n]) (user_2, [feature_1, feature_2,..., feature_n])

I build the user features using this as an input for the build_user_features function and the train and predict with my model.

I am using only categorical variables (one hot encoded) since I got an error when trying to pass continuous variables to lightfm in this way. I also did a random search to find the best hyperparameters to train the model, and find a little improvement but nothing significant (still almost same prediction for every user).

Is there somehting I'm doing wrong? I would really appreciate any help!

clementechiu avatar Nov 05 '19 12:11 clementechiu

Hello did you find any solution? I am having a similar problem

charbelc avatar Mar 02 '20 14:03 charbelc

Yes! It was a lot of things, but the main problem seemed to be that I wasnt normalizing my variables properly. Let me know if you need any help!

El lun., 2 mar. 2020 a las 11:20, charc ([email protected]) escribió:

Hello did you find any solution? I am having a similar problem

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lyst/lightfm/issues/497?email_source=notifications&email_token=ANQSMBMO5PRODKIUBCBUTELRFO6B7A5CNFSM4JJCFA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENPPF4I#issuecomment-593425137, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQSMBPB4OPZ5OGHAMJYSVTRFO6B7ANCNFSM4JJCFA2Q .

clementechiu avatar Mar 02 '20 18:03 clementechiu

Thank you for your help.

I have 20 users and 50 items. Each user rated all the items so i divided my data into train and test sets. I am trying to predict the ratings. My training data is as following: userId itemId rating 1 47 4 1 202 3 1 101 5

Here is my model:

n_latent_factors=50 model_DL=[]

User Embeddings user_input = Input(shape=(1,), name='User_Input') user_embeddings = Embedding(input_dim = n_users, output_dim=n_latent_factors, input_length=1, name='User_Embedding') (user_input) user_vector = Flatten(name='User_Vector') (user_embeddings)

Item Embeddings item_input = Input(shape=(1,), name='Item_Input') item_embeddings = Embedding(input_dim = n_items, output_dim=n_latent_factors, input_length=1, name='item_Embedding') (item_input) item_vector = Flatten(name='item_Vector') (item_embeddings)

Concatenate Product merged_vectors = concatenate([user_vector, item_vector], name='Concantenate') dense_layer_1 = Dense(200, activation='relu')(merged_vectors) dense_layer_1 = Dropout(0.1) (dense_layer_1) dense_layer_1 = BatchNormalization()(dense_layer_1) dense_layer_2 = Dense(100, activation='relu')(dense_layer_1) dense_layer_2 = Dropout(0.1) (dense_layer_2) dense_layer_2 = BatchNormalization()(dense_layer_2) dense_layer_3 = Dense(100, activation='relu')(dense_layer_2) dense_layer_3 = Dropout(0.1) (dense_layer_3) dense_layer_4 = BatchNormalization()(dense_layer_3) dense_layer_5 = Dense(50, activation='relu')(dense_layer_4) dense_layer_5 = Dropout(0.1) (dense_layer_5) dense_layer_6 = BatchNormalization()(dense_layer_5) dense_layer_7 = Dense(20, activation='relu')(dense_layer_6) dense_layer_7 = Dropout(0.1) (dense_layer_7) dense_layer_8 = BatchNormalization()(dense_layer_7)

result = Dense(1,activation='relu')(dense_layer_8) model_DL = Model([user_input, item_input], result)

optimizer = Adam(lr = 0.001) model_DL.compile(loss='mean_squared_error', optimizer = optimizer) history = model_DL.fit(x=[x_train['userId'], x_train['itemId']], y=y_train, batch_size= batch_size, epochs=epochs, verbose= 2, validation_data=([x_test['userId'], x_test['itemId']], y_test))

The problem is that the network is predicting the same ratings (for the training set or the testing set) for each user no matter what is the item (except for the first item... i don't know why!). I have also tried to put a 'tanh' activation function on the last layer instead of 'relu', then i scaled the output into 1 and 5. Nothing changed.

Am i missing something? Thank you

charbelc avatar Mar 03 '20 09:03 charbelc

Why such a deep architecture with your tiny dataset?

If you want to use Keras, the simplest thing you can do is to use a Dot Layer with the user and item embeddings (with output_dim of the Embedding layers smaller than 20, since the rank of your original matrix couldn't be higher).

FrancescoI avatar Mar 03 '20 09:03 FrancescoI

Hey @clementechiu , can you describe the solution for using user features to Lightfm prediction? I have the same issue that when I'm fit the model with user features or item features I got almost the same results to all users.

all my features are categorical and in the dataset I add them all the unique values. Thanks a lot.

eveTu avatar Mar 19 '20 14:03 eveTu

Hi,

The solution will depend on what your problem is. For my particularly, I had to normalize the variables myself, since LightFM's normalization was not adequate for my data. Also, I remember struggling a lot with the format LightFM requires the input to be given. I ended up using a list, where the first element was the user index and the second one a dictionary of the form {'feature_1': 'value_feature_1',...}. You can see more details in LightFM's documentaion.

Let me know if you have more doubts, or show me some code to see what specifically I can help you with

clementechiu avatar Mar 19 '20 16:03 clementechiu

Hello Everyone, I have a dataset of 6 unique users with 50 unique items. Among these users, two of them have same features. While predicting items for these two users, I get the same result. But when I predict without using user features I get different results. Can someone please help and provide solution on how to tackle this issue.

mm27368 avatar May 09 '20 12:05 mm27368

Yes! It was a lot of things, but the main problem seemed to be that I wasnt normalizing my variables properly. Let me know if you need any help! El lun., 2 mar. 2020 a las 11:20, charc ([email protected]) escribió: Hello did you find any solution? I am having a similar problem — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#497?email_source=notifications&email_token=ANQSMBMO5PRODKIUBCBUTELRFO6B7A5CNFSM4JJCFA22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENPPF4I#issuecomment-593425137>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQSMBPB4OPZ5OGHAMJYSVTRFO6B7ANCNFSM4JJCFA2Q .

Hey, I am facing the similar issue. Could you please explain how you normalized properly?

rohit-u2 avatar May 22 '20 13:05 rohit-u2

https://github.com/lyst/lightfm/issues/353 After a long struggle.

awaiskaleem avatar Jul 08 '20 22:07 awaiskaleem