DeepMatch-Torch icon indicating copy to clipboard operation
DeepMatch-Torch copied to clipboard

Two Bugs regarding YoutubeDNN

Open NeverSalar opened this issue 2 years ago • 2 comments

There are two bugs related to the codes for YoutubeDNN model.

  1. The gen_data_set_youteube has a typo... should be youtube. (Not necessarily a bug lol)

  2. Here's the first bug: gen_data_set_youteube will produce the negative samples ONLY, without any positive samples. Consequently all training labels will be 0.

  3. The second one: [neg_list[item_idx] for item_idx in np.random.choice(neg_list, negsample)] is not correct. It should directly call the indexes.

NeverSalar avatar Jul 30 '22 03:07 NeverSalar

  1. you should also look the YouTubeDNN model file. In gen_data_set_youtube,what we can is: [1, 0, 0, 0...],and we set the first one is positive, else as negative to simulate SampledSoftmax; To see more information, you can see here
  2. Maybe it is a little confused. But it is truely the index of items. Because we normalize the item id to idex;

hope for you reply~

bbruceyuan avatar Jul 30 '22 08:07 bbruceyuan

Thanks for the explanation! Now it makes sense to me why all labels are 0. Would be helpful to add some comments in the preprocessing.py to prevent confusion in the future!

I am still a bit unsure of the issue on the negtive sampling. Previously when I ran the code an out-of-boundary error occurred. Will let you know if I have updates.

NeverSalar avatar Jul 30 '22 19:07 NeverSalar