kserve
kserve copied to clipboard
Collocate transformer and predictor for simplifing the deployment
/kind feature
Describe the solution you'd like While deploying transformer/predictor as separate services gives you the flexibility of running different replicas for transformer and predictor. KServe 0.11 already starts to support collocating transformer and predictor in the same pod for custom containers, we'd like to provide an option for collocating transformer with supported serving runtimes as well(e.g triton , TorchServe).
Anything else you would like to add:
Proposed Inference Service Spec Option 1
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torch-transformer
annotations:
serving.kserve.io/collocate: true
spec:
transformer:
containers:
- image: kserve/image-transformer:latest
name: transformer-container
command:
- "python"
- "-m"
- "model"
args:
- --model_name
- mnist
predictor:
model:
modelFormat:
name: pytorch
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
Proposed Inference Service Spec Option 2
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torch-transformer
spec:
predictor:
model:
modelFormat:
name: pytorch
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
containers:
- image: kserve/image-transformer:latest
name: transformer-container
command:
- "python"
- "-m"
- "model"
args:
- --model_name
- mnist
Benefits:
- Reduce networking overhead between node to node communications
- Reduce complexity of deploying transformer and predictor as separate services and tune the ratio for best performance
- Reduce the risk of non-compatible changes between transformer and predictor, now they can be deployed as a single unit.
Other considerations:
- Compare the performance between collocating transformer and predictor vs launching as separate services
- We would also like to rename transformer as there is always confusion with the "Transformer Model" from google's paper.
Links to the design documents: [Optional, start with the short-form RFC template to outline your ideas and get early feedback.] [Required, use the longer-form design doc template to specify and discuss your design in more detail]
We would also like to rename transformer as there is always confusion with the "Transformer Model" from google's paper.
I second this. An alternative maybe processor
as it does pre&post processing