NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

Speed up scoring in NimbusML

Open ganik opened this issue 5 years ago • 1 comments

Consider this python based prediction server code:

One time setup

model = Pipeline() model.load_model('model.zip')

Numerous multiple calls to predict:

model.predict(data)

########################## Currently calls to predict(data) always make ML.NET to load physical model file and setup transformer / trainer pipeline chain before calling predict()

If we are able to cache this chain object on ML.NET side @ model.load_model('model.zip') call and then re-use it in model.predict(data) then the predict() will speed up significantly.

This is a real case scenario from customer where measured predict() itself takes less than 5 ms while loading model takes more than 300ms.

ganik avatar Jun 17 '19 21:06 ganik

The approximate design changes are: Specify cache=True parameter in load_model(): model.load_model('model.zip', cache=True) cache=True forces model to be loaded in ML.NET. This will run a new EntryPoint in ML.NET that loads the model and keeps it in memory for later. load_model() is still a void function with no return value. model_cache_id is not returned, but is recorded in the pipeline object to be used for predictions.

model.predict(data) No change to predict(). Under the hood, since the pipeline has loaded a model with caching, this will call a new/modified EntryPoint in ML.NET that scores with already loaded model.

ganik avatar Jun 17 '19 21:06 ganik