Merlin [RMP] Support Offline Batch processing of Recs Generation Pipelines

Problem:

As a user, I would like to run my merlin systems inference pipeline in an offline setting. This will allow me to produce a set of recommendations for all users to be served from a data store, email campaign, etc. I will also be able to conduct rigorous testing and better compare behaviors against other systems, at both operator and system level.

Goal:

To do this I need to be able to run my merlin systems inference graph without using triton or the configs generated for it. It will require a new operator executor class that runs the ops in python instead of tritonserver. The execution should behave exactly as it does in the tritonserver setting, meaning each operator should be provided same inputs, and return same outputs.

Run an Inference operator graph without tritonserver.
Does not require any new user-facing API changes.
Execute the same graph, that would be deployed to tritonserver.
Execute in Python process

Constraints:

Use the same merlin systems graph/ops that were created for inference pipeline, that would run on tritonserver
Swap out the operator executor to python version (non-triton).
Allow for all types of graphs, supporting multiple chains and parallel running of ALL available operators.

TODO:

Core

[x] https://github.com/NVIDIA-Merlin/core/pull/140
[x] https://github.com/NVIDIA-Merlin/core/pull/141
[x] https://github.com/NVIDIA-Merlin/core/pull/143
[x] https://github.com/NVIDIA-Merlin/core/pull/146

Systems

[x] https://github.com/NVIDIA-Merlin/systems/pull/204
[x] Validate that we can run a systems ensemble on Dask

Issues

[x] #461
[x] #462
[x] #463
[x] https://github.com/NVIDIA-Merlin/Merlin/issues/505
[x] https://github.com/NVIDIA-Merlin/Merlin/issues/506
[x] https://github.com/NVIDIA-Merlin/Merlin/issues/507

Example

[ ] #798

### Tasks
- [ ] Create Offline runtime, that will swap operators according to usage i.e. (swap feast operator for dataset merge operator.
- [ ] Ensure every operator returns batch based results. I.e. faiss should return batch representation of inputs. I.e. 2 users in should produce  (2, 100) not (200,) shape.
- [ ] Create an offline example from the current multistage example in merlin
- [ ] Ensure ensemble export does not prevent using Non-triton runtimes later.

Jun 27 '22 22:06 jperez999

Assignees will be Karl / Adam.

Jul 13 '22 17:07 viswa-nvidia

This is a prerequisite for cross-FW evaluation

Jul 13 '22 21:07 sohn21c

My impression is that batch inference for models is required for cross-FW evaluation, not the full batch inference for a system. The additional steps in the Systems' computation graph (QueryFeast, QueryFaiss, Softmax, filtering, etc) would likely not be required for batch inference on a single Model. Batch inference for the model would have a simpler "training data in -> predictions out" process, which would likely be a step in the Systems graph.

Perhaps we should first build the batch inference functionality (apply nvt transform + use model to predict) including the output format schema, and then that functionality could be shared in cross-FW evaluation and systems-wide batch prediction.

Jul 18 '22 18:07 nv-alaiacano

We do have some batch prediction functionality for models already, but it's not quite structured in a way that would make it a reasonable foundation for batch processing of graphs. I think we could massage it in that direction though and try to standardize how batch graph processing works in Merlin Core by taking what exists and refactoring it in the right direction.

Jul 19 '22 16:07 karlhigley

@karlhigley do you think we should add an example for it?

Oct 17 '22 21:10 bschifferer

I think we should add an example for every new piece of significant functionality (i.e. almost all roadmap issues.)

Oct 20 '22 14:10 karlhigley

https://github.com/NVIDIA-Merlin/core/pull/352 https://github.com/NVIDIA-Merlin/systems/pull/376

Jun 27 '23 16:06 jperez999

This is not considered done until we can run all systems operators with a dask executor to create recommendation. Currently some systems operators work with batches of input data as shown in 1022. We need to make all operators work with batches of incoming data.

Jul 05 '23 16:07 jperez999

@jperez999 Could you add appropriate tasks to the list in the description?

Jul 05 '23 16:07 karlhigley

(People don't generally scroll down to see the latest comments when we look at WIP issues to track their progress, so a comment helps but a description update is better.)

Jul 05 '23 16:07 karlhigley

Need to be able to swap out certain operators, based on runtime. I.e. when running daskexecutor for offline batch it is not necessary to run the feature store operator unless we are testing against it. You could run a dataset merge operator instead using offline features stored in a parquet file. Please refer to task list created for further tracking

Jul 05 '23 16:07 jperez999

Merlin Merlin copied to clipboard

[RMP] Support Offline Batch processing of Recs Generation Pipelines

Problem:

Goal:

Constraints:

TODO:

Core

Systems

Issues

Example

Merlin
Merlin copied to clipboard