rankfm icon indicating copy to clipboard operation
rankfm copied to clipboard

Error while fit with 200k user_interaction matrix, item features and user features

Open seuriously opened this issue 3 years ago • 3 comments

I'm running the lib on a virtual server with 64gb RAM. My data consist of: 200k distinct interaction between users and item 52k x 11 user_feature matrix 2770 x 49 item_feature matrix all NA are replaced by 0

when i try to run it gives me this error: AssertionError: user factors [v_u] are not finite - try decreasing feature/sample_weight magnitudes sometimes it would give me item factors error as well

However, if I run on 170k user interaction without user_features and item_features it would run smoothly

What is the meaning of the error?

seuriously avatar Nov 18 '21 14:11 seuriously

Hi I have the same issue as you. Was wondering if you've solved the issue or not?

I have 173k distinct interaction between users and item 4k x 20 item features dataframe 370k x 30 user feature dataframe

when i try to run it gives me this error: AssertionError: item weights [w_i] are not finite - try decreasing feature/sample_weight magnitudes

so now i can only run the model without the item and user auxiliary features.

ZoeLeung2021 avatar Jun 13 '22 21:06 ZoeLeung2021

I too have this same error. Even with my item features dataframe comprising two columns [product_id INT32, retailprice FLOAT64] of 270 rows. I try also with two different columns, [product_id, category_id INT16], and it's the same issue.

I can include sample_weight. If I try to add any user or product attributes I get the error; AssertionError: item weights [w_i] are not finite

model2 = RankFM(factors=20, loss='warp', max_samples=100, learning_schedule='invscaling')
model2.fit(user_item_train, 
           item_features=item_attributes_train, 
           #user_features=user_attributes_train,
           sample_weight=sample_weight_train, 
           epochs=25, 
           verbose=True)

type(item_attributes_train) pandas.core.frame.DataFrame

item_attributes_train.dtypes PRODUCT_ID int32 RETAIL_PRICE float64 dtype: object

item_attributes_train.head(3) PRODUCT_ID RETAIL_PRICE 0 10162 1.75 1 10145 1.00 2 101433 7.95

jonathanswalton77 avatar Dec 02 '22 16:12 jonathanswalton77

I may have resolved this issue that I'm facing by re-presenting all the numeric data as scaled values between 0 and 1, and with categories IDs being one-hot encoded (as you'd expect).

jonathanswalton77 avatar Dec 02 '22 16:12 jonathanswalton77