optimum-intel
optimum-intel copied to clipboard
Concurrency support using model clone
Support for execution with multi concurrency
- Adding clone operation to model objects which creates new execution context without duplicating compiled_model and memory usage. It enables multi-concurrency in multithreaded applications.
- request attribute is now deprecated - new attributes compiled_model and infer_request are added instead - they match OpenVINO objects
- improved performance for decoder models by eliminating creating new requests for each inference
Before submitting
- [x] Did you make sure to update the documentation with your changes?
- [x] Did you write any new necessary tests?
This PR is a replacement for https://github.com/huggingface/optimum-intel/pull/519
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.