Alexander Kozlov
Alexander Kozlov
Thanks, I noticed a couple of problems. One is related to the NNCF quantization:  FakeQuantize should not be propagated through the ReLU. This leads to the fact that we...
I think we should consider 3 options in this integration: - [ ] Convert the model to OpenVINO IR and run it as is - [ ] Use OpenVINO 8-bit...
Regarding first two "(1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly ", are you going to use OVTF and anything handmade for...
cc'ed @alexsu52
@helena-intel, can you please take a look as well?
@helena-intel, thanks for the comments. This is a POC for now. BTW, I have some concerns regarding accuracy-aware quantization and I am going to revise its API soon. The main...
@vshampor, @ljaljushkin can you please take a look?
Just for the record, the main motivation for keeping the config is QAT for NPU which has some custom features such as W4A4 support.
LGTM, thanks.
@junior-zsy, @tsaizehua, take a look here: https://github.com/huggingface/optimum/pull/1479