Make MMS more user-friendly at start-up
For certain use-cases, for example when you prefer to have docker image as artefact rather than relying only on the .mar format. (That can be the case when your models depends on external libraries and packages for pre/post-processing or when it integrates with another part of the stack), or when your hardware is set to scale with the number of running containers etc. MMS has some functionality gaps:
- There is no simple way to detect a failed model loading with MMS
- currently you can rely on looking for 'ERROR' and 'WARNING' in the logs
- you can diff the
8081/modelswith the list of supposedly loaded models
--> Ideally I would prefer if I could start MMS in a synchronous manner that would return -1 if MMS encounters errors at start-up. That way when deploying a container you could easily make sure that start-up worked without problem.
- There are gaps in the CLI commands compared to the rest API
- You cannot dynamically add models after startup
--> It would be great to be able to have a mms model add MODEL_NAME=MODEL_PATH --num_workers=3 etc
- The CLI is only for startup, it's not meant for management. Use API to manage models.
- The model specified at startup time, will keep retry in background.
- Use can use API to check model's status.
- User can use API to manage those models after MMS startup.
- There might be multiple models, MMS should keep running even model loading failed.
Now you can control number of workers in config.properties: default_workers_per_model
@frankfliu , I suggest you update documentation of MMS and describe this parameter and what it does.
You can find document about default_workers_per_model in: https://github.com/awslabs/mxnet-model-server/blob/master/docs/configuration.md
And example: https://github.com/awslabs/mxnet-model-server/blob/master/docker/config.properties#L8
Thanks, @frankfliu.
@ThomasDelteil , does this sound good to you? If this addresses your concerns/suggestions - could you please close the issue. If not - could you please provide your feedback on @frankfliu's recommendations?
This solved my issue and limited the number of workers starting to what my GPU could handle.