MLServer
MLServer copied to clipboard
HuggingFace Optimum Runtime extensions
trafficstars
As a follow-up from the initial PR that introduced HuggingFace Optimum Runtime via #4081 we have identified a set of followup tasks to improve the servers:
- [ ] Extend for complex outputs / inputs to use v2 protocol
- [ ] Support loading artifacts from provided model-URI
- [x] Ensure Optimum library is pinned with v1.2 is released (currently on
master) - solved via https://github.com/SeldonIO/MLServer/pull/580 - [x] Fix with parallel server (currently single server by default)
- [ ] Allow further parameters from pipeline to be provided via config (eg device, or all via kwargs)
- [x] Ensure MLServer liveness probe works before loading first model to avoid timeout
- [x] Update runtime to use SUPPORTED_TASKS from the optimum pipelines module as per https://github.com/huggingface/optimum/issues/172
- [x] Ensure TRANFORMERS_CACHE is respected and remove once updated https://github.com/huggingface/optimum/issues/186