Motivation

Public datasets are generally provided with negative samples to make it easier to train and compare results for different algorithms. Although, the most common for industry use cases is to have a dataset with only the users interactions (positive-only), as items that the user (might) have seen and not interacted are usually not logged. Most modern neural architectures need negative candidates for optimized training, because the number of items in the catalog of large-scale recsys is in the order of millions.

Requirements

RQ01 - Be Available in both NVT Pre-processing and Data Loading

The candidate sampling should be primarily performed by the NVT Data Loader, so that for different epochs we might have different negative samples for each positive sample. But it could also be available during pre-processing, in cases you would like to persist some fixed negative samples to compare different training algorithms which might not use the NVT Data Loader.

RQ02 - Feature Sets config

Provide a configuration of feature sets to bring RecSys taxonomy for some important features. That configuration will be used during NVT pre-processing, and should be persisted to be available also for the NVT Data Loaders and for custom training/eval scripts. The minimum features sets to allow candidate sampling managed by NVT and temporal dataset split are:

Item id feature - Used by candidate sampling, as it is the key that represents a candidate item.
Item metadata features - Used by candidate sampling, when item metadata attributes are fed as input features for recommendation (for hybrid recommendation architectures, like W&D and DLRM), because we need to provide those item metadata features for both positives and negative samples.

RQ03 - Recommendable items set

Provide the following methods to form the recommendable items set, composed by items that were available for users in a given point of time, to be considered as a valid negative samples:

Global - All items in the dataset are considered as recommendable
Temporal For a given training or eval batch:
- Past - All previously observed items are recommendable
- Recent - All items with events observed within the last N minutes/hours/days are recommendable
- Recent batches - Only items within the current batch or previous batches (buffer) are recommendable, assuming that - batches are mildly sorted by time

RQ04 - Sampling methods

Provide the following methods for negative sampling from the recommendable items set:

Uniform sampling - All recommendable items have the sample probability to be sampled
Recency sampling - Fresh items have a higher probability to be sampled This one requires keeping a table with the first timestamp when item has been seen (i.e. its “release” date), to compute the “age” of the item at a given point of time
Popularity sampling - Probability is the item’s past popularity (normalized by the popularity of all other items).
Recent Popularity sampling - Probability is the item’s relative popularity within a recent time frame (e.g. 1 hour / day / week)

References: Doc - NVTabular - Requirements on pre-processing for session-based recommendation and candidate sampling

Oct 15 '20 23:10 gabrielspmoreira

As a side note, I have read recently the paper "Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison" from RecSys 2020, where they perform a rigorous evaluation of many algorithms, datasets, preprocessing strategies, losses functions and negative sampling strategies. In section 3.3, they show that Uniform Sampling, although simple, usually produced models with better accuracy than Popularity negative sampling.

It is important to note that their models use only user and item ids (CF). But for models leveraging with additional features (e.g. item popularity, target encoding of item id), such features could leak which are the positive (usually popular items) and the negatives (usually unpopular items if uniformly sampled).

Thus, it is also important to provide the popularity-based negative sampling in this feature too, and maybe a setting to control the percentage of negative items that will be sampled from uniform and from popularity distribution, like in this paper (Section 4.1).

Nov 24 '20 18:11 gabrielspmoreira

I have implemented an example of sampling with cuDF, where you can set a continuous parameter which ranges between 0.0 (uniform sampling and 1.0 (popularity sampling). This provides more flexibility to the user, and might be a hyperparameter in the training pipeline.

Mar 12 '21 17:03 gabrielspmoreira

Side note: I have a lot of question marks about negative sampling strategies, loss functions, and offline evaluation after reading "How Sensitive is Recommendation Systems’ Offline Evaluation to Popularity?" Although the paper is framed as being about evaluation, I think it's also revealing about the impact of different sampling strategies (e.g. BPR vs. WARP) on popularity-related biases. This is an area I'd love to explore and understand better.

Mar 12 '21 17:03 karlhigley

Hey ! Not sure If I have the "right" nor should comment this FEA as a simple library user but I was curious if this was somewhat implemented. We are especially interested in the implementation of the so called time based Recommendable items set to generate realistic negative samples. Looking at the current doc of the library on NVT and Merlin Models I didn't see any clear evidence that this was implemented, do you confirm ?

Mar 26 '25 11:03 guillaume-chech

@guillaume-chech for sampling please see the models lib, for example we have in batch neg sampling implemented.

Mar 26 '25 18:03 rnyak

@guillaume-chech for sampling please see the models lib, for example we have in batch neg sampling implemented.

Hey @rnyak, Yes I did have a look at this, it's a fair approximation yet not equivalent to what is described in RQ-03. So I guess this was not implemented.

Mar 27 '25 08:03 guillaume-chech

[FEA] Negative sampling for positive-only datasets

Motivation

Requirements

RQ01 - Be Available in both NVT Pre-processing and Data Loading

RQ02 - Feature Sets config

RQ03 - Recommendable items set

RQ04 - Sampling methods