optimum
optimum copied to clipboard
Intel OpenVINO backend
What does this PR do?
This PR introduces Intel OpenVINO backend for inferencing. Port of https://github.com/huggingface/transformers/pull/14203
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you make sure to update the documentation with your changes?
- [x] Did you write any new necessary tests?
@mfuntowicz, @echarlaix, may I ask for review?
Hi! Just wanted to ask if you had a chance to review this proposal?
Hi @dkurt, thanks for the interesting PR. We are currently waiting for more visibility concerning our collaboration in order to decide which libraries and toolkits integration we are prioritising. I will get back to you as soon as we have more informations.
I think we should consider 3 options in this integration:
- [ ] Convert the model to OpenVINO IR and run it as is
- [ ] Use OpenVINO 8-bit post-training quantization via POT API (example is here)
- [ ] Use QAT from NNCF to get a more accurate optimized model
@AlexKoff88, first point is already implemented, if I understood it correctly. There are three options: (1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly (3) Use OpenVINO IR (download from the hub or local). The only thing which is missed is caching (will do by next commit).
Regarding POT and NNCF, do you expect them as a separate module or a part of optimum.intel.openvino
?
Regarding first two "(1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly ", are you going to use OVTF and anything handmade for PyTorch to accomplish that?
As for your last question, since both tools are in the OpenVINO ecosystem, it is worth to have optimization within optimum.intel.openvino
Regarding first two "(1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly ", are you going to use OVTF and anything handmade for PyTorch to accomplish that?
No, actually. Just OpenVINO runtime. Through ONNX for PyTorch models and Model Optimizer for TensorFlow ones. Check load_ov_model_from_pytorch
and load_ov_model_from_tf
from PR. Sorry if I misunderstood the question.
Hi! I've published all the work at https://github.com/dkurt/optimum-openvino (OpenVINO runtime and NNCF). Should I propose a reference in setup.py
similar to Graphcore or update this PR including NNCF?
Alternative solution based on standalone package: https://github.com/huggingface/optimum/pull/64.