kubedl
kubedl copied to clipboard
[summer of code] light-weighted traffic control for inference
Topic Description
for now, istio
handles traffic distribution when inference serves multiple model versions with different traffic weight, however, it is more like using a sledgehammer to crack a nut, we'd design a dependent tiny component to handle traffic distribution and get rid of external dependency (for example, a concomitant pod per Inference
), it is also able to apply serving-specific optimizations in traffic-layer such as batching
...
Assignee
TODO
/assign @ccchenjiahuan