Merlin
Merlin copied to clipboard
[RMP] Support Offline Batch processing of Recs Generation Pipelines
Problem:
As a user, I would like to run my merlin systems inference pipeline in an offline setting. This will allow me to produce a set of recommendations for all users to be served from a data store, email campaign, etc. I will also be able to conduct rigorous testing and better compare behaviors against other systems, at both operator and system level.
Goal:
To do this I need to be able to run my merlin systems inference graph without using triton or the configs generated for it. It will require a new operator executor class that runs the ops in python instead of tritonserver. The execution should behave exactly as it does in the tritonserver setting, meaning each operator should be provided same inputs, and return same outputs.
- Run an Inference operator graph without tritonserver.
- Does not require any new user-facing API changes.
- Execute the same graph, that would be deployed to tritonserver.
- Execute in Python process
Constraints:
- Use the same merlin systems graph/ops that were created for inference pipeline, that would run on tritonserver
- Swap out the operator executor to python version (non-triton).
- Allow for all types of graphs, supporting multiple chains and parallel running of ALL available operators.
TODO:
Core
- [x] https://github.com/NVIDIA-Merlin/core/pull/140
- [x] https://github.com/NVIDIA-Merlin/core/pull/141
- [x] https://github.com/NVIDIA-Merlin/core/pull/143
- [x] https://github.com/NVIDIA-Merlin/core/pull/146
Systems
- [x] https://github.com/NVIDIA-Merlin/systems/pull/204
- [x] Validate that we can run a systems ensemble on Dask
Issues
- [x] #461
- [x] #462
- [x] #463
- [x] https://github.com/NVIDIA-Merlin/Merlin/issues/505
- [x] https://github.com/NVIDIA-Merlin/Merlin/issues/506
- [x] https://github.com/NVIDIA-Merlin/Merlin/issues/507
Example
- [ ] #798
### Tasks
- [ ] Create Offline runtime, that will swap operators according to usage i.e. (swap feast operator for dataset merge operator.
- [ ] Ensure every operator returns batch based results. I.e. faiss should return batch representation of inputs. I.e. 2 users in should produce (2, 100) not (200,) shape.
- [ ] Create an offline example from the current multistage example in merlin
- [ ] Ensure ensemble export does not prevent using Non-triton runtimes later.
Assignees will be Karl / Adam.
This is a prerequisite for cross-FW evaluation
My impression is that batch inference for models is required for cross-FW evaluation, not the full batch inference for a system. The additional steps in the Systems' computation graph (QueryFeast, QueryFaiss, Softmax, filtering, etc) would likely not be required for batch inference on a single Model. Batch inference for the model would have a simpler "training data in -> predictions out" process, which would likely be a step in the Systems graph.
Perhaps we should first build the batch inference functionality (apply nvt transform + use model to predict) including the output format schema, and then that functionality could be shared in cross-FW evaluation and systems-wide batch prediction.
We do have some batch prediction functionality for models already, but it's not quite structured in a way that would make it a reasonable foundation for batch processing of graphs. I think we could massage it in that direction though and try to standardize how batch graph processing works in Merlin Core by taking what exists and refactoring it in the right direction.
@karlhigley do you think we should add an example for it?
I think we should add an example for every new piece of significant functionality (i.e. almost all roadmap issues.)
https://github.com/NVIDIA-Merlin/core/pull/352 https://github.com/NVIDIA-Merlin/systems/pull/376
This is not considered done until we can run all systems operators with a dask executor to create recommendation. Currently some systems operators work with batches of input data as shown in 1022. We need to make all operators work with batches of incoming data.
@jperez999 Could you add appropriate tasks to the list in the description?
(People don't generally scroll down to see the latest comments when we look at WIP issues to track their progress, so a comment helps but a description update is better.)
Need to be able to swap out certain operators, based on runtime. I.e. when running daskexecutor for offline batch it is not necessary to run the feature store operator unless we are testing against it. You could run a dataset merge operator instead using offline features stored in a parquet file. Please refer to task list created for further tracking