RecBole [🐛BUG] Describe your problem in one sentence.

[🐛BUG] Describe your problem in one sentence.

Open QinHsiu opened this issue 3 years ago • 1 comments

为什么要把所有序列推荐模型训练集进行扩充，希望贵团队能够解答一下这个疑惑，非常感谢。 for i, uid in enumerate(self.inter_feat[self.uid_field].numpy()): if last_uid != uid: last_uid = uid seq_start = i else: if i - seq_start > max_item_list_len: seq_start += 1 uid_list.append(uid) item_list_index.append(slice(seq_start, i)) target_index.append(i) item_list_length.append(i - seq_start)

May 30 '22 13:05 QinHsiu

@QinHsiu 您好！感谢您对伯乐的关注与支持！

1.序列推荐模型往往需要大量的训练数据才能达到比较良好的结果，如果不进行这样的数据增强，序列化数据的量可能并不足够；

2.序列数据往往都是非定长的，有些用户一次就能找到自己想要的，而有些需要大量点击后才能找到自己想要的，所以序列数据中的每个点击其实都是有意义的，这里我们将每一个time-stamp的点击都视为训练样本，并且通过max_item_list_len和pad将序列处理为定长。这里我们参考了“Improved Recurrent Neural Networks for Session-based Recommendations”的处理方式；

3.如果您想使用其它的数据增强方式，您可以修改我们的data_augmentation函数来实现；如果您不想使用数据增强方法，您可以提前处理好数据，然后使用 benchmark_filename 这个参数直接加载处理好的（自定义划分后的）数据，详情请参考 session_based_rec_example.py。

May 31 '22 01:05 Wicknight

由于长时间无新回复，该 issue 已关闭。如果还有疑问，欢迎随时评论。

Nov 27 '22 05:11 Wicknight

If the dataset is large, such as Movie 20M or larger, does it still need data augment? This step requires a lot of memory. I am wondering what is the good size that does not need the data augment.

Jul 21 '23 21:07 night18

RecBole RecBole copied to clipboard

[🐛BUG] Describe your problem in one sentence.

RecBole
RecBole copied to clipboard