recommenders icon indicating copy to clipboard operation
recommenders copied to clipboard

Ranking products

Open Ullar-Kask opened this issue 3 years ago • 5 comments

Hi,

In a ranking model for a web-store- what is customary to use as product ranking? In the movie database it's movie rankings given by users, in the web-store there are no product rankings, just transaction data (who purchased what and when).

(Essentially the same question as in https://github.com/tensorflow/recommenders/issues/355 and https://github.com/tensorflow/recommenders/issues/389)

Ullar-Kask avatar Sep 05 '22 09:09 Ullar-Kask

@Ullar-Kask

You should have data about what was visible for the user each time on the page. Products that were visible for the user and did not result in a click serve as implicit negative examples, while products that were clicked on serve as explicit positive examples. Giving additional features for both the user and the items, you predict the probability for each item to be clicked on. Then you sort these items based on the predicted probabiltiies.

hkristof03 avatar Sep 16 '22 11:09 hkristof03

Thanks for your thoughts! We do not have data about what was visible for the user each time on the page, nor the click data. As a solution to the problem, I am thinking of generating synthetic transaction data as negative samples. Namely, for each transaction record one (or two, or say, N) synthetic records using true transaction data and a randomly selected item from the set of items the customer has not purchased and setting label=0 for the record. What do you think of this approach? Might it work? What is the reasonable value of N? Does the purchase frequency of an item play a role when used as such a negative sample?

Ullar-Kask avatar Sep 17 '22 10:09 Ullar-Kask

@Ullar-Kask Do you have some updates on how your approach suceeded? I am facing the same cenario, if you could provide some code on how you manage to pre-process your data to the described format would be very helpful.

JV-Nunes avatar Oct 27 '23 19:10 JV-Nunes

The approach works as our testing shows. We generate N negative samples for each positive sample (as mentioned above). The larger the value of N the better results. Currently we have N=40, and it's limited by the mount of memory in the server. I am not displaying the complete code because it's pretty technical, but in principal we loop over customers, for customer_id, df_customer in df.groupby('customer_id', sort=False)[['product_id']]:, generate negative samples for each customer and "rate" them in the following way (label="rating"):

  1. label=0: randomly picked unpurchased unclicked unrecommended product from the product catalog (large source of random products);
  2. label=1: recommended but unclicked and unpurchased product (unpurchased within the timeframe on T-365...T-90 days);
  3. label=2: clicked but unpurchased product; and then for positive samples we take
  4. label=3: purchased product

Products with labels 0..2 sum up to N for each label=3 record. You may experiment by switching labels 0 and 1.

Here you have the "movie ratings" database ;)

BTW, using cudf instead of pandas speeds negative samples generation 2x.

Ullar-Kask avatar Oct 28 '23 14:10 Ullar-Kask

@Ullar-Kask great approach! Thanks for sharing. At the moment I don't have click information, but I do know if the product has been recommended in the past. I'm going to try a scoring system similar to the one you use. As for the model itself, is the one developed in this tutorial a good starting point?

JV-Nunes avatar Oct 30 '23 11:10 JV-Nunes