kserve icon indicating copy to clipboard operation
kserve copied to clipboard

Collocate transformer and predictor for simplifing the deployment

Open yuzisun opened this issue 1 year ago • 1 comments

/kind feature

Describe the solution you'd like While deploying transformer/predictor as separate services gives you the flexibility of running different replicas for transformer and predictor. KServe 0.11 already starts to support collocating transformer and predictor in the same pod for custom containers, we'd like to provide an option for collocating transformer with supported serving runtimes as well(e.g triton , TorchServe).

Anything else you would like to add:

Proposed Inference Service Spec Option 1

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: torch-transformer
  annotations:
      serving.kserve.io/collocate: true
spec:
  transformer:
    containers:
    - image: kserve/image-transformer:latest
      name: transformer-container
      command:
        - "python"
        - "-m"
        - "model"
      args:
        - --model_name
        - mnist
  predictor:
    model:
      modelFormat:
        name: pytorch
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1

Proposed Inference Service Spec Option 2

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: torch-transformer
spec:
  predictor:
    model:
      modelFormat:
        name: pytorch
      storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
    containers:
    - image: kserve/image-transformer:latest
      name: transformer-container
      command:
        - "python"
        - "-m"
        - "model"
      args:
        - --model_name
        - mnist

Benefits:

  • Reduce networking overhead between node to node communications
  • Reduce complexity of deploying transformer and predictor as separate services and tune the ratio for best performance
  • Reduce the risk of non-compatible changes between transformer and predictor, now they can be deployed as a single unit.

Other considerations:

  • Compare the performance between collocating transformer and predictor vs launching as separate services
  • We would also like to rename transformer as there is always confusion with the "Transformer Model" from google's paper.

Links to the design documents: [Optional, start with the short-form RFC template to outline your ideas and get early feedback.] [Required, use the longer-form design doc template to specify and discuss your design in more detail]

yuzisun avatar Nov 26 '23 15:11 yuzisun

We would also like to rename transformer as there is always confusion with the "Transformer Model" from google's paper.

I second this. An alternative maybe processor as it does pre&post processing

cceyda avatar Mar 20 '24 03:03 cceyda