recommenders Ranking products

Hi,

In a ranking model for a web-store- what is customary to use as product ranking? In the movie database it's movie rankings given by users, in the web-store there are no product rankings, just transaction data (who purchased what and when).

(Essentially the same question as in https://github.com/tensorflow/recommenders/issues/355 and https://github.com/tensorflow/recommenders/issues/389)

Sep 05 '22 09:09 Ullar-Kask

@Ullar-Kask

You should have data about what was visible for the user each time on the page. Products that were visible for the user and did not result in a click serve as implicit negative examples, while products that were clicked on serve as explicit positive examples. Giving additional features for both the user and the items, you predict the probability for each item to be clicked on. Then you sort these items based on the predicted probabiltiies.

Sep 16 '22 11:09 hkristof03

Thanks for your thoughts! We do not have data about what was visible for the user each time on the page, nor the click data. As a solution to the problem, I am thinking of generating synthetic transaction data as negative samples. Namely, for each transaction record one (or two, or say, N) synthetic records using true transaction data and a randomly selected item from the set of items the customer has not purchased and setting label=0 for the record. What do you think of this approach? Might it work? What is the reasonable value of N? Does the purchase frequency of an item play a role when used as such a negative sample?

Sep 17 '22 10:09 Ullar-Kask

@Ullar-Kask Do you have some updates on how your approach suceeded? I am facing the same cenario, if you could provide some code on how you manage to pre-process your data to the described format would be very helpful.

Oct 27 '23 19:10 JV-Nunes

The approach works as our testing shows. We generate N negative samples for each positive sample (as mentioned above). The larger the value of N the better results. Currently we have N=40, and it's limited by the mount of memory in the server. I am not displaying the complete code because it's pretty technical, but in principal we loop over customers, for customer_id, df_customer in df.groupby('customer_id', sort=False)[['product_id']]:, generate negative samples for each customer and "rate" them in the following way (label="rating"):

label=0: randomly picked unpurchased unclicked unrecommended product from the product catalog (large source of random products);
label=1: recommended but unclicked and unpurchased product (unpurchased within the timeframe on T-365...T-90 days);
label=2: clicked but unpurchased product; and then for positive samples we take
label=3: purchased product

Products with labels 0..2 sum up to N for each label=3 record. You may experiment by switching labels 0 and 1.

Here you have the "movie ratings" database ;)

BTW, using cudf instead of pandas speeds negative samples generation 2x.

Oct 28 '23 14:10 Ullar-Kask

@Ullar-Kask great approach! Thanks for sharing. At the moment I don't have click information, but I do know if the product has been recommended in the past. I'm going to try a scoring system similar to the one you use. As for the model itself, is the one developed in this tutorial a good starting point?

Oct 30 '23 11:10 JV-Nunes

recommenders recommenders copied to clipboard

Ranking products

recommenders
recommenders copied to clipboard