Merlin icon indicating copy to clipboard operation
Merlin copied to clipboard

[RMP] Recsys Tutorial & Demo - Flesh out the multi-stage recommender example architecture

Open karlhigley opened this issue 2 years ago • 17 comments

Problem:

Customers need a clear example of a multi-stage recommender pipeline that they can follow and use to create their own versions. The upcoming Recsys tutorial will be a public sharing of this example and serves as the deadline for its completion.

Goal:

  • Provide a clear example of a multi stage recommender pipeline that does retrieval, filtering, ranking and ordering.
  • Highlight NVTabular, our dataloader, Merlin models, and Merlin systems and how they work together.

Constraints:

  • NVTabular doesn't support splitting of workflows for user and item features. Two separate workflows for the data are needed to process each.

Starting Point:

  • [ ] - [ ] https://github.com/NVIDIA-Merlin/systems/issues/99 ( This has been removd from scope)
  • [ ] #458
  • [ ] #449 - This is an optional requirement (removed)

karlhigley avatar May 05 '22 15:05 karlhigley

@karlhigley hello. any updates on the estimated deadline for adding NVT to ensemble graph? we would be needing this feature for RecSys tutorial.

rnyak avatar Jun 17 '22 14:06 rnyak

So, based on our conversation the other day, I'm not sure we actually do need NVT in the ensemble graph in order to send raw data in the request, since it should be possible to map between raw ids and categorified ids just with the way we ingest data into Feast and the way we build the ensemble graph.

The other option, since Merlin Systems already has an operator for running an NVT Workflow, would be to construct separate Workflows for user and item features, join together the results for offline model training, and run those Workflows independently at serving time using the existing NVT Workflow operator in Systems.

Based on the understanding that are two approaches for doing this already available, we're not actively working on adding any new functionality to better support NVT in the ensemble graph right now (which would likely require some fundamental capabilities to build/label/extract sub-graphs from a Workflow to do effectively.)

karlhigley avatar Jun 17 '22 17:06 karlhigley

(If neither of those approaches work, then we should talk about getting this epic issue prioritized on the roadmap. @EvenOldridge would be a good person to talk to about that.)

karlhigley avatar Jun 17 '22 17:06 karlhigley

The second option is probably the best fit right now. I agree it would be nice to be able to automatically split the graph into user and item aspects and then deploy each of them separately, but it's not possible without subgraphs in NVT and as Karl says there are straightforward workarounds.

EvenOldridge avatar Jun 17 '22 23:06 EvenOldridge

@rnyak can you update the issue for the tutorial to include this work of splitting the user and item feature workflows.

EvenOldridge avatar Jun 18 '22 00:06 EvenOldridge

@rnyak , could you please help to provide details such as the problem, goal, constraints and starting point in the top. Some of this info may be available in the comments. Please help to summarize it at the top. If you are facing any difficulties please let me know.

viswa-nvidia avatar Jun 29 '22 22:06 viswa-nvidia

Updated the ticket. @rnyak can you make sure that all next steps are properly captured.

EvenOldridge avatar Jul 04 '22 21:07 EvenOldridge

@karlhigley how is this different from e2e example we have here? Will this be an enhancement to e2e example e.g. adding filtering w/ bloom filters?

sohn21c avatar Jul 08 '22 18:07 sohn21c

Add a filtering stage (Bloom filters for previous interactions)? - Is this in scope ? bloom filtering is not in scope. The demo paper in Recsys should have filtering - need better definition of the filtering stage. We should have a simple implementation for the demo ( need not be Bloom filters ). We will use filtering by category - need to check if the dataset will support this. H&M dataset has category features.

viswa-nvidia avatar Jul 11 '22 15:07 viswa-nvidia

@rnyak are there categories we could filter on or should we look into adding a created 'in season/stock' column that we could filter on

EvenOldridge avatar Jul 13 '22 16:07 EvenOldridge

Pushing to arbitration - Examples, documentation, work in core, systems is not captured here.

viswa-nvidia avatar Jul 15 '22 19:07 viswa-nvidia

I am following up on the discussion. I apologize for missing the discussion

@sohn21c - The minimum, we want to present, is the example you linked.

Provide a clear example of a multi stage recommender pipeline that does retrieval, filtering, ranking and ordering.

I think we need to discuss that goal. When we discussed the proposal for the RecSys2022 Tutorial, we had filtering stage as optional. We did not talk about ordering. I am happy to include them, if the features are available. Do we have a ticket for implementing an ordering ops in Merlin Systems?

What we discussed was, that the must haves are:

  • Solve the issue with reverse categorical mapping from categorified id back to the original value #458
  • [FEA] Add NVT model to ensemble DAG of the multi-stage-recsys pipeline to send raw data as a request to TIS

Although the latter one doesnt necessary require a NVTabular model. The goal is that we send raw userID IDs to the endpoint.

bschifferer avatar Jul 18 '22 15:07 bschifferer

@jperez999 ,to check with @bschifferer and @rnyak regarding op to lookup values and merge it back. If required a ticket has to be created.

viswa-nvidia avatar Jul 18 '22 16:07 viswa-nvidia

https://github.com/NVIDIA-Merlin/Merlin/pull/474 <--- This PR shows how its done. It shows the initial request coming from the raw user id and then it shows the inference response has the appropriate raw item id. Should be enough to show how to get this working. @rnyak @bschifferer @EvenOldridge @karlhigley

jperez999 avatar Jul 18 '22 20:07 jperez999

Updated the title to include the scope of code changes required by the RecSys Demo

karlhigley avatar Aug 08 '22 21:08 karlhigley

@rnyak , do you have a doc that captures the details of the planning for the recsys tutorial ?

viswa-nvidia avatar Aug 10 '22 16:08 viswa-nvidia

https://docs.google.com/document/d/14iQH9mA_SR3HJLqaisB6wybmIJidT2yQRSNUslqVy6U/edit

viswa-nvidia avatar Aug 10 '22 16:08 viswa-nvidia

moved pending tasks to eng-improvement. Closing this ticket as Recsys work is complete

viswa-nvidia avatar Sep 12 '22 16:09 viswa-nvidia