[Task] Improve negative sampling for retrieval
Problem:
There is a considerable amount of tech-debt in the code that’s responsible for negative sampling in retrieval. The biggest issue is that it relies heavily on the model-context, which we would like to remove in order to simplify things.
Goal:
- Simplify the base-class
PredictionTask - Remove the usage of model-context in the prediction-tasks
- Generalize sampling-queues to allow for ranking
Constraints:
- Keep user-facing API the same.
Starting Point:
- [x] Implement
PredictionBlock,BinaryPrediction&RegressionPrediction: Done - [x] Implement a new negative-sampler base class Done
- [x] Implement
DotProductDone - [x] Implement
ContrastivePredictionBlock+ tests Done - [x] Implement
DotProductPredictionDone - [ ] Tests for two-tower/mf with BinaryPrediction & RegressionPrediction
- [x] Implement
CategoricalPrediction(without contrastive capabilities) Done - [x] Make
CategoricalPredictionallow for negative-sampling Done
For later release:
-
[ ] Update retrieval models with new DotProductCategoricalPrediction block Note: This is pushed to the next release as it depends on another task not captured in this ticket, which is the refactory of the top-k recommender model: https://github.com/NVIDIA-Merlin/models/issues/622
-
Impelement different types of samplers one by one
@EvenOldridge , is this getting added to the 22.08 POR. ?
This RMP ticket has been prioritized over merlin dependencies used in T4Rec
@sararb , are all the unchecked boxes on track for 22.08 ? let me know which ones are moving to later release
@viswa-nvidia I have just updated the ticket with the tasks that are moving to later release.