MLServer Out of the box support of graph optimiser

Out of the box support of graph optimiser

Open saeid93 opened this issue 2 years ago • 2 comments

Feature request - Since doing an optimization step before the deep learning model is becoming very common in machine learning deployment, out-of-the-box support in MLServer could be beneficial, some examples are TVM. This could be added as a config option in the MLServer model config file. An easy starting point would be to add support of Optimum to the HuggingFace runtime and in case of positive feedback gradually add it as a general feature.

Mar 05 '23 17:03 saeid93

Hey @saeid93 ,

That's a great point.

We're currently looking into ways to introduce optimisers within the Seldon stack. It's not 100% clear yet though whether this makes sense at the inference server-level or whether it's something that should happen upstream (e.g. within the orchestrator - like Seldon Core).

BTW regarding Optimum, this should be already part of the HF runtime :)

Mar 06 '23 09:03 adriangonz

Hi @adriangonz ,

I'm glad to hear that's something on the agenda. Personally, I think it should be part of the model servers and upsteam frameworks be responsible for high-level tasks like routing. However, I'm very interested to see how this decision will be made for Seldon/MLServer in the future. Just saw the Optimum commit 😁

Mar 06 '23 20:03 saeid93

MLServer MLServer copied to clipboard

Out of the box support of graph optimiser

MLServer
MLServer copied to clipboard