Intel OpenVINO backend

Open dkurt opened this issue 2 years ago • 9 comments

What does this PR do?

This PR introduces Intel OpenVINO backend for inferencing. Port of https://github.com/huggingface/transformers/pull/14203

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you make sure to update the documentation with your changes?
[x] Did you write any new necessary tests?

Nov 12 '21 09:11 dkurt

@mfuntowicz, @echarlaix, may I ask for review?

Nov 16 '21 09:11 dkurt

Hi! Just wanted to ask if you had a chance to review this proposal?

Nov 24 '21 11:11 dkurt

Hi @dkurt, thanks for the interesting PR. We are currently waiting for more visibility concerning our collaboration in order to decide which libraries and toolkits integration we are prioritising. I will get back to you as soon as we have more informations.

Nov 29 '21 14:11 echarlaix

I think we should consider 3 options in this integration:

[ ] Convert the model to OpenVINO IR and run it as is
[ ] Use OpenVINO 8-bit post-training quantization via POT API (example is here)
[ ] Use QAT from NNCF to get a more accurate optimized model

Dec 02 '21 10:12 AlexKoff88

@AlexKoff88, first point is already implemented, if I understood it correctly. There are three options: (1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly (3) Use OpenVINO IR (download from the hub or local). The only thing which is missed is caching (will do by next commit).

Regarding POT and NNCF, do you expect them as a separate module or a part of optimum.intel.openvino?

Dec 02 '21 10:12 dkurt

Regarding first two "(1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly ", are you going to use OVTF and anything handmade for PyTorch to accomplish that?

As for your last question, since both tools are in the OpenVINO ecosystem, it is worth to have optimization within optimum.intel.openvino

Dec 03 '21 11:12 AlexKoff88

Regarding first two "(1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly ", are you going to use OVTF and anything handmade for PyTorch to accomplish that?

No, actually. Just OpenVINO runtime. Through ONNX for PyTorch models and Model Optimizer for TensorFlow ones. Check load_ov_model_from_pytorch and load_ov_model_from_tf from PR. Sorry if I misunderstood the question.

Dec 03 '21 12:12 dkurt

Hi! I've published all the work at https://github.com/dkurt/optimum-openvino (OpenVINO runtime and NNCF). Should I propose a reference in setup.py similar to Graphcore or update this PR including NNCF?

Dec 30 '21 11:12 dkurt

Alternative solution based on standalone package: https://github.com/huggingface/optimum/pull/64.

Jan 12 '22 06:01 dkurt

optimum optimum copied to clipboard

Intel OpenVINO backend

What does this PR do?

Before submitting

optimum
optimum copied to clipboard