optimum-intel icon indicating copy to clipboard operation
optimum-intel copied to clipboard

Concurrency support using model clone

Open dtrawins opened this issue 1 year ago • 2 comments

Support for execution with multi concurrency

  • Adding clone operation to model objects which creates new execution context without duplicating compiled_model and memory usage. It enables multi-concurrency in multithreaded applications.
  • request attribute is now deprecated - new attributes compiled_model and infer_request are added instead - they match OpenVINO objects
  • improved performance for decoder models by eliminating creating new requests for each inference

Before submitting

  • [x] Did you make sure to update the documentation with your changes?
  • [x] Did you write any new necessary tests?

dtrawins avatar Feb 16 '24 15:02 dtrawins

This PR is a replacement for https://github.com/huggingface/optimum-intel/pull/519

dtrawins avatar Feb 16 '24 15:02 dtrawins

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.