MLServer HuggingFace Optimum Runtime extensions

HuggingFace Optimum Runtime extensions

Open axsaucedo opened this issue 3 years ago • 0 comments

trafficstars

As a follow-up from the initial PR that introduced HuggingFace Optimum Runtime via #4081 we have identified a set of followup tasks to improve the servers:

[ ] Extend for complex outputs / inputs to use v2 protocol
[ ] Support loading artifacts from provided model-URI
[x] Ensure Optimum library is pinned with v1.2 is released (currently on master) - solved via https://github.com/SeldonIO/MLServer/pull/580
[x] Fix with parallel server (currently single server by default)
[ ] Allow further parameters from pipeline to be provided via config (eg device, or all via kwargs)
[x] Ensure MLServer liveness probe works before loading first model to avoid timeout
[x] Update runtime to use SUPPORTED_TASKS from the optimum pipelines module as per https://github.com/huggingface/optimum/issues/172
[x] Ensure TRANFORMERS_CACHE is respected and remove once updated https://github.com/huggingface/optimum/issues/186

May 09 '22 12:05 axsaucedo

MLServer MLServer copied to clipboard

HuggingFace Optimum Runtime extensions

MLServer
MLServer copied to clipboard