NimbusML
NimbusML copied to clipboard
Speed up scoring in NimbusML
Consider this python based prediction server code:
One time setup
model = Pipeline() model.load_model('model.zip')
Numerous multiple calls to predict:
model.predict(data)
########################## Currently calls to predict(data) always make ML.NET to load physical model file and setup transformer / trainer pipeline chain before calling predict()
If we are able to cache this chain object on ML.NET side @ model.load_model('model.zip') call and then re-use it in model.predict(data) then the predict() will speed up significantly.
This is a real case scenario from customer where measured predict() itself takes less than 5 ms while loading model takes more than 300ms.
The approximate design changes are:
Specify cache=True parameter in load_model():
model.load_model('model.zip', cache=True)
cache=True forces model to be loaded in ML.NET. This will run a new EntryPoint in ML.NET that loads the model and keeps it in memory for later. load_model() is still a void function with no return value. model_cache_id is not returned, but is recorded in the pipeline object to be used for predictions.
model.predict(data)
No change to predict(). Under the hood, since the pipeline has loaded a model with caching, this will call a new/modified EntryPoint in ML.NET that scores with already loaded model.