NotImplementedError: Data has duplicate values
data_model = ItemColdStartData( training_data, *training_data.columns, # userid, itemid item_features=content_feature_df, seed=seed)
print(data_model)
HERE IM GETTING ERROR: NotImplementedError: Data has duplicate values
My dataframe has multiple entries for a user. cant drop them. any help here
Hi!
The problem is not that your data contains multiple entries for a user, but that your data contains multiple entries of the same user-item pair. It's like having multiple ratings for the same movie from the same user. This is not a standard collaborative filtering scenario.
You need to deduplicate such entries, e.g., like this:
dedup_data = data.drop_duplicates(subset=['userid', 'movieid'])
Understood thanks for the help.
Facing one more blocker. data_model.prepare() kind of takes a lot of time and freezes when I run the step. Any idea why? i know my dataset is big but any optimisation that can be followed?
