[Task] Finalize YouTubeDNN retrieval model
Problem:
YouTubeDNN retrieval model is the most popular approach for recommendation domains like e-commerce, media, news, which either suffer from user-cold start problem or where users preferences change a lot over time. YouTubeDNN differs from other retrieval models like TwoTower and MF, as it represents the user by the sequence of his last interactions, rather than trying to learn embeddings for the user. YouTubeDNN is going to be the first model that supports sequences of interactions in Merlin Models, providing building blocks that will be useful for future RNN and Transformers support. Most of the work for full support of YouTubeDNN is done, but there are still some pending tasks.
Goal:
- Finish the YouTubeDNN retreival model in order to
- Start support sequential interactions in Merlin Models, as expected by many our customers, as a first step towards upcomming support for more advanced sequential and session-based recommendation.
- Enable us to provide a more comprehensive benchmark of retrieval models, i.e. comparing YouTubeDNN with MF and Two-Tower.
Constraints:
- The YouTubeDNN should inherit from 'RetrievalModel' like MF and Two-tower, so that the evaluation metrics can be computed over all items using TopkIndex.
- The current YouTubeDNN implementation uses a popularity-based sampler for sampled softmax, that assumes that the item id was categorified so that the ids are sorted by frequency decreasingly (like done by default by NVTabular
Categorify()op) - The target generation is done as a pre-processing step using Pandas (set different cutting points of the past user's interaction and reserve the last item as the target)
Starting Point:
- [ ] https://github.com/NVIDIA-Merlin/models/issues/496
- [ ] https://github.com/NVIDIA-Merlin/models/pull/473
- [ ] https://github.com/NVIDIA-Merlin/models/pull/454
- [ ] https://github.com/NVIDIA-Merlin/models/issues/540
The problem statement doesnt' articulate what customer value is being provided
@viswa-nvidia I have refined the problem description a bit to highlight the value of the YouTubeDNN support
This will be assigned to Edward
The planned tasks don't seem to cover the problem statement. @EvenOldridge , please review
@marcromeyn , what is this task blocked on ?