RecTools icon indicating copy to clipboard operation
RecTools copied to clipboard

np.setdiff1d is too slow

Open sharthZ23 opened this issue 3 years ago • 0 comments

If user_id and item_id columns are CategoryDType, then np.setdiff1d works very slowly on large volumes (>10 million unique ones) Possible solution is to replace: https://github.com/MobileTeleSystems/RecTools/blob/76c41e0e039cd050b46ec0f6cb7f0f668fca9574/rectools/model_selection/time_split.py#L146 with

new_users = set(df_test[Columns.User].unique()) - set(df_train[Columns.User].unique()) 

And same for https://github.com/MobileTeleSystems/RecTools/blob/76c41e0e039cd050b46ec0f6cb7f0f668fca9574/rectools/model_selection/time_split.py#L150

sharthZ23 avatar Aug 07 '22 20:08 sharthZ23