Results 41 comments of Alexander Kozlov

Thanks, I noticed a couple of problems. One is related to the NNCF quantization: ![image](https://user-images.githubusercontent.com/25342812/193613967-2885e6c2-fbec-4a5f-8d02-b6e38c5aab73.png) FakeQuantize should not be propagated through the ReLU. This leads to the fact that we...

I think we should consider 3 options in this integration: - [ ] Convert the model to OpenVINO IR and run it as is - [ ] Use OpenVINO 8-bit...

Regarding first two "(1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly ", are you going to use OVTF and anything handmade for...

@helena-intel, can you please take a look as well?

@helena-intel, thanks for the comments. This is a POC for now. BTW, I have some concerns regarding accuracy-aware quantization and I am going to revise its API soon. The main...

@vshampor, @ljaljushkin can you please take a look?

Just for the record, the main motivation for keeping the config is QAT for NPU which has some custom features such as W4A4 support.

@junior-zsy, @tsaizehua, take a look here: https://github.com/huggingface/optimum/pull/1479