kubedl icon indicating copy to clipboard operation
kubedl copied to clipboard

[summer of code] light-weighted traffic control for inference

Open SimonCqk opened this issue 3 years ago • 1 comments

Topic Description

for now, istio handles traffic distribution when inference serves multiple model versions with different traffic weight, however, it is more like using a sledgehammer to crack a nut, we'd design a dependent tiny component to handle traffic distribution and get rid of external dependency (for example, a concomitant pod per Inference), it is also able to apply serving-specific optimizations in traffic-layer such as batching...

Assignee

TODO

SimonCqk avatar Aug 05 '21 04:08 SimonCqk

/assign @ccchenjiahuan

ccchenjiahuan avatar Aug 08 '21 12:08 ccchenjiahuan