MLServer Dynamic change of batch size

In some cases we need to be able to change some of the configurations of the deployed models like the batch size on the fly without reloading the model, I think this can be implemented by adding an endpoint that changes the model settings' values.

Jan 06 '23 19:01 saeid93

Hey @saeid93 ,

A big chunk of logic (like the adaptive batcher), currently assumes that only needs to be triggered when a model is loaded / unloaded / reloaded. So this wouldn't be a trivial change. We would also need to be careful on which options can be changed on-the-fly, as we wouldn't want a user modifying the model's name, version or runtime.

Is there any reason why reloading the model wouldn't work in this case?

Jan 10 '23 16:01 adriangonz

Hey @adriangonz , The main issue is that reloading the model will impose a downtime for changes like change of batch variables which technically do not need the reloading. e.g. in the case of changing the model the model should necessarly be re-loaded but some config are changable on the fly like batch variables.

Jan 11 '23 04:01 saeid93

As far as I know, model reloading should happen gracefully. As in, it won't replace the model (i.e. unload the old one) until the new version is up and ready. That was, at least, the intention.

Have you noticed downtimes when reloading models?

Jan 11 '23 09:01 adriangonz

I think that this logic is for Kubernetes. You can write a microservice for observing traffic or stats of the model and change batch_size env MLSERVER_MODEL_MAX_BATCH_SIZE in a pod. Kubernetes will create a new pod and close the old one. The Ideal option will be to create its own plugin for Keda

Jan 17 '23 14:01 gawsoftpl

Hey @gawsoftpl,

That would totally be the way to handle this in single-model serving scenarios. However, in multi-model serving scenarios, the server becomes a stateful component that manages the model's lifecycle itself.

Having said that, the approach in this case should be similar. That is, you just update your settings and "spin up a new model" (i.e. by sending a new /load request to MLServer), which should reload the model gracefully within the same MLServer pod.

Jan 19 '23 15:01 adriangonz

Hey @saeid93 ,

Following up on this one, have you had a chance to check if you can see any downtimes when reloading models? As discussed in https://github.com/SeldonIO/MLServer/issues/932#issuecomment-1378467352, the intention is that model reloading should happen gracefully.

Apr 20 '23 15:04 adriangonz

Hey @adriangonz , The model can be re-loaded in the runtime with minimal disruption using this. However, the problem is I couldn't find any way to modify the model setting with that rest interface in a containerized MLServer. As my original intention is to reload the model with a new batch size. As a hack, I even exec ed into the continer and changed model-settings.json with the new batch size but it seems that the model is getting reloaded with the original settings. Is there a way to inject new settings through the /load rest request?

May 01 '23 23:05 saeid93

MLServer MLServer copied to clipboard

Dynamic change of batch size

MLServer
MLServer copied to clipboard