Description

This repo contains a set of practice inference graphs implemented using Seldon core inference graph. Inference graphs in seldon folder are implemented using Seldon 1st gen custom python package and pipelines in mlserver folder are implemented using Serving Custom Model Seldon's newer serving platform mlserver and Seldon Inference Graph.

NOTE: This repo is shared for learning purposes, some of the pipeliens implemented here might not have a real-world usecases and they are not fully tested.

Pull requests, suggestions and completing the list of pipelines for future implementation are highly appreciated.

Inference graphs implemented using 1st gen Seldon

Pipelines from InferLine: latency-aware provisioning and scaling for prediction serving pipelines

Cascade
Ensemble
Preprocess
Vidoe Monitoring

inferline

and the following pipelines:

inferline

audio-qa: Audio to text -> Question Answering
audio-sent: Audio to text -> Sentiment Analysis
nlp: language identification -> translate fr to Eng -> summerisation
sum-qa: Summerisation -> Question Answering
video: Object Detection -> Object Classification

Inference graphs implemented using MLServer

audio-qa: Audio to text -> Question Answering
audio-sent: Audio to text -> Sentiment Analysis
nlp: language identification -> translate fr to Eng -> summerisation
sum-qa: Summerisation -> Question Answering
video: Object Detection -> Object Classification

DockerHub

Pre-built container images are also available here. Therefore if you are just trying out, you can deploy yaml files on your K8S cluster the way they are.

Relevant Projects

Some of the academic and industrial relevant projects that could be used as a source of Inference Pipelines for future implementations.

System's related Academic Papers

ML Theory related Academic Papers

Software Engineering related Academic Papers

Industrial Projects

Load Tester

This repo also includes a small async load tester for sending workloads to the models/pipeliens. You can find it under async load tester folder.

Sources of Models

Audio and Text Models

Source:

HuggingFace

For Image Models

Source:

Please give a star if this repo helped you learning somthing new :)

TODOs (sorted by priority)

Re-implement pipelines in Seldon V2
Add an example of using shared models in pipeliens using V2
Example of multi-model request propagation
Example implementation using Nvidia Triton Server as the base containers instead of MLServer
Examples of model load/unload in Triton and MLServer
GPU examples with fractional GPUs
Send image/audio/text in a compresssed fromat
Add performance evaluation scripts and load tester
Complete Unfinished pipelines
Examples of using Triton Client for interacting with MLSserver examples
Examples of using Triton Inference Server as the serving backend
Pipelines implementation in upcoming Seldon core V2
Examples of Integration with Autoscalers (Builtin Autoscaler, VPA and event-driven autoscaler like KEDA)
Implemnet GPT2 -> DALL-E pipeline inspired from dalle-runtime

seldon-inference-pipelines
seldon-inference-pipelines copied to clipboard

Metadata

Description

Inference graphs implemented using 1st gen Seldon

Inference graphs implemented using MLServer

DockerHub

Relevant Projects

System's related Academic Papers

ML Theory related Academic Papers

Software Engineering related Academic Papers

Industrial Projects

Load Tester

Sources of Models

Audio and Text Models

For Image Models

TODOs (sorted by priority)

← Metadata

Owner

Metadata

seldon-inference-pipelines seldon-inference-pipelines copied to clipboard

Metadata

Description

Inference graphs implemented using 1st gen Seldon

Inference graphs implemented using MLServer

DockerHub

Relevant Projects

System's related Academic Papers

ML Theory related Academic Papers

Software Engineering related Academic Papers

Industrial Projects

Load Tester

Sources of Models

Audio and Text Models

For Image Models

TODOs (sorted by priority)

← Metadata

Owner

Metadata

seldon-inference-pipelines
seldon-inference-pipelines copied to clipboard