LibRecommender
LibRecommender copied to clipboard
Hyperparameters for DeepFM
Hello,
We are having an issue with our code where it starts overfitting with every epoch. The other evaluation metrics are also very low. We are quite new to Python and coding and were wondering if you could help us? We are using your LibRecommender, DeepFM code as well as the 1M dataset.
Here is the bit of code with the parameters. What should we change?
sparse_col = ["sex", "occupation", "genre1", "genre2", "genre3"]
dense_col = ["age"]
user_col = ["sex", "age", "occupation"]
item_col = ["genre1", "genre2", "genre3"]
train_data, data_info = DatasetFeat.build_trainset(
train_data, user_col, item_col, sparse_col, dense_col
)
eval_data = DatasetFeat.build_testset(eval_data)
print(data_info)
# do negative sampling, assume the data only contains positive feedback
train_data.build_negative_samples(data_info, item_gen_mode="random",
num_neg=1, seed=2020)
eval_data.build_negative_samples(data_info, item_gen_mode="random",
num_neg=1, seed=2222)
reset_state("DeepFM")
deepfm = DeepFM("ranking", data_info, embed_size=16, n_epochs=2,
lr=1e-4, lr_decay=False, reg=None, batch_size=1,
num_neg=1, use_bn=False, dropout_rate=None,
hidden_units="128,64,32", tf_sess_config=None)
deepfm.fit(train_data, verbose=2, shuffle=True, eval_data=eval_data,
metrics=["loss", "balanced_accuracy", "roc_auc", "pr_auc",
"precision", "recall", "map", "ndcg"])
Thank you for your help!
Salta
Yes, we are using the data provided in the repository
Due to the Github space limitation, the data in the repository is only part of the original 1M dataset, which has 100 thousand rows, so it's easy to overfit.
You can try reducing the embed_size
, lr
, hidden_units
, increasing the batch_size
or use_bn
. For example,
deepfm = DeepFM("ranking", data_info, embed_size=4, n_epochs=2,
lr=3e-5, lr_decay=False, reg=None, batch_size=2048,
num_neg=1, use_bn=True, dropout_rate=None,
hidden_units="32,16", tf_sess_config=None)
Thanks for the fast reply! By how much can we reduce the lr and hidden_units? Is there a specific sequence we should follow?
Also, we noticed when we increase the batch size, the recommender system keeps recommending the same (5-10) movies to all users. Only we set the batch size to 1, then we get different recommendations. Is this normal?
And how can we increase use_bn?
- I think there is no specific rule about this, so all I can say is following your heart. As you are getting more experiences, you will find out what to do some day.
- In some sense it is normal. Large batch size tends to make the training stable, in other words the training is more effective. In a normal movie recommender system, some movies have to be recommended more frequently than others since they are popular movies, which means most people will like them when being recommended. Small batch size tends to make the training unstable, and the model will become useless. That's why although you got different recommendations, the metrics are low. Because the model are essentially making random recommendations.
- Sorry I expressed it in the wrony way. It's setting the
use_bn
toTrue
.