infinity
infinity copied to clipboard
[Feature] [Optimum] [Intel] [OpenVINO] Add OpenVINO backend support through Optimum-Intel
Description
This is a PR that integrates OpenVINO backend into Infinity's Optimum Embedder class through the use of optimum-intel library.
Related Issue
If applicable, link the issue this PR addresses.
Types of Change
- [X] New feature
- [ ] Documentation update
Checklist
- [X] I have read the CONTRIBUTING guidelines.
- [X] My code follows the code style of this project.
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.
- [X] My changes generate no new warnings.
- [ ] I have updated the documentation accordingly.
Additional Notes
There are multiple inferencing precisions that can be specified through in libs/infinity_emb/infinity_emb/transformer/utils_optimum.py
"ov_config":{
"INFERENCE_PRECISION_HINT": "bf16" # it supports fp32, fp16 and bf16
}
The Inference precision hint is hardcoded to bf16 because it offers the fastest inference speed.
We have also performed MTEB evaluation test (bankclassification dataset) on the INT4 weight only quantized model with BF16 inference precision, the drop in accuracy is just 0.71%.
Based on speed and accuracy tradeoff as well as the ease-of-use, we think that settling down on a single effective configuration could enhance the user experience of infinity_emb.
License
By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
Attention: Patch coverage is 44.15584% with 43 lines in your changes missing coverage. Please review.
Project coverage is 78.20%. Comparing base (
c9a8404) to head (295e840). Report is 8 commits behind head on main.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## main #454 +/- ##
==========================================
- Coverage 79.18% 78.20% -0.99%
==========================================
Files 41 41
Lines 3248 3308 +60
==========================================
+ Hits 2572 2587 +15
- Misses 676 721 +45
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.