Merlin
Merlin copied to clipboard
[RMP] Recsys Tutorial & Demo - Flesh out the multi-stage recommender example architecture
Problem:
Customers need a clear example of a multi-stage recommender pipeline that they can follow and use to create their own versions. The upcoming Recsys tutorial will be a public sharing of this example and serves as the deadline for its completion.
Goal:
- Provide a clear example of a multi stage recommender pipeline that does retrieval, filtering, ranking and ordering.
- Highlight NVTabular, our dataloader, Merlin models, and Merlin systems and how they work together.
Constraints:
- NVTabular doesn't support splitting of workflows for user and item features. Two separate workflows for the data are needed to process each.
Starting Point:
- [ ] - [ ] https://github.com/NVIDIA-Merlin/systems/issues/99 ( This has been removd from scope)
- [ ] #458
- [ ] #449 - This is an optional requirement (removed)
@karlhigley hello. any updates on the estimated deadline for adding NVT to ensemble graph? we would be needing this feature for RecSys tutorial.
So, based on our conversation the other day, I'm not sure we actually do need NVT in the ensemble graph in order to send raw data in the request, since it should be possible to map between raw ids and categorified ids just with the way we ingest data into Feast and the way we build the ensemble graph.
The other option, since Merlin Systems already has an operator for running an NVT Workflow
, would be to construct separate Workflows
for user and item features, join together the results for offline model training, and run those Workflows
independently at serving time using the existing NVT Workflow operator in Systems.
Based on the understanding that are two approaches for doing this already available, we're not actively working on adding any new functionality to better support NVT in the ensemble graph right now (which would likely require some fundamental capabilities to build/label/extract sub-graphs from a Workflow
to do effectively.)
(If neither of those approaches work, then we should talk about getting this epic issue prioritized on the roadmap. @EvenOldridge would be a good person to talk to about that.)
The second option is probably the best fit right now. I agree it would be nice to be able to automatically split the graph into user and item aspects and then deploy each of them separately, but it's not possible without subgraphs in NVT and as Karl says there are straightforward workarounds.
@rnyak can you update the issue for the tutorial to include this work of splitting the user and item feature workflows.
@rnyak , could you please help to provide details such as the problem, goal, constraints and starting point in the top. Some of this info may be available in the comments. Please help to summarize it at the top. If you are facing any difficulties please let me know.
Updated the ticket. @rnyak can you make sure that all next steps are properly captured.
@karlhigley how is this different from e2e example we have here? Will this be an enhancement to e2e example e.g. adding filtering w/ bloom filters?
Add a filtering stage (Bloom filters for previous interactions)? - Is this in scope ? bloom filtering is not in scope. The demo paper in Recsys should have filtering - need better definition of the filtering stage. We should have a simple implementation for the demo ( need not be Bloom filters ). We will use filtering by category - need to check if the dataset will support this. H&M dataset has category features.
@rnyak are there categories we could filter on or should we look into adding a created 'in season/stock' column that we could filter on
Pushing to arbitration - Examples, documentation, work in core, systems is not captured here.
I am following up on the discussion. I apologize for missing the discussion
@sohn21c - The minimum, we want to present, is the example you linked.
Provide a clear example of a multi stage recommender pipeline that does retrieval, filtering, ranking and ordering.
I think we need to discuss that goal. When we discussed the proposal for the RecSys2022 Tutorial, we had filtering
stage as optional. We did not talk about ordering. I am happy to include them, if the features are available. Do we have a ticket for implementing an ordering ops in Merlin Systems?
What we discussed was, that the must haves are:
- Solve the issue with reverse categorical mapping from categorified id back to the original value #458
- [FEA] Add NVT model to ensemble DAG of the multi-stage-recsys pipeline to send raw data as a request to TIS
Although the latter one doesnt necessary require a NVTabular model. The goal is that we send raw userID IDs to the endpoint.
@jperez999 ,to check with @bschifferer and @rnyak regarding op to lookup values and merge it back. If required a ticket has to be created.
https://github.com/NVIDIA-Merlin/Merlin/pull/474 <--- This PR shows how its done. It shows the initial request coming from the raw user id and then it shows the inference response has the appropriate raw item id. Should be enough to show how to get this working. @rnyak @bschifferer @EvenOldridge @karlhigley
Updated the title to include the scope of code changes required by the RecSys Demo
@rnyak , do you have a doc that captures the details of the planning for the recsys tutorial ?
https://docs.google.com/document/d/14iQH9mA_SR3HJLqaisB6wybmIJidT2yQRSNUslqVy6U/edit
moved pending tasks to eng-improvement. Closing this ticket as Recsys work is complete