MLServer icon indicating copy to clipboard operation
MLServer copied to clipboard

HuggingFace Optimum Runtime extensions

Open axsaucedo opened this issue 3 years ago • 0 comments
trafficstars

As a follow-up from the initial PR that introduced HuggingFace Optimum Runtime via #4081 we have identified a set of followup tasks to improve the servers:

  • [ ] Extend for complex outputs / inputs to use v2 protocol
  • [ ] Support loading artifacts from provided model-URI
  • [x] Ensure Optimum library is pinned with v1.2 is released (currently on master) - solved via https://github.com/SeldonIO/MLServer/pull/580
  • [x] Fix with parallel server (currently single server by default)
  • [ ] Allow further parameters from pipeline to be provided via config (eg device, or all via kwargs)
  • [x] Ensure MLServer liveness probe works before loading first model to avoid timeout
  • [x] Update runtime to use SUPPORTED_TASKS from the optimum pipelines module as per https://github.com/huggingface/optimum/issues/172
  • [x] Ensure TRANFORMERS_CACHE is respected and remove once updated https://github.com/huggingface/optimum/issues/186

axsaucedo avatar May 09 '22 12:05 axsaucedo