Alexander Kozlov comments

Results 41 comments of


                                            Alexander Kozlov

Different model output after compression in NNCFNetwork and .onnx/.xml

Thanks, I noticed a couple of problems. One is related to the NNCF quantization: ![image](https://user-images.githubusercontent.com/25342812/193613967-2885e6c2-fbec-4a5f-8d02-b6e38c5aab73.png) FakeQuantize should not be propagated through the ReLU. This leads to the fact that we...

Intel OpenVINO backend

I think we should consider 3 options in this integration: - [ ] Convert the model to OpenVINO IR and run it as is - [ ] Use OpenVINO 8-bit...

Intel OpenVINO backend

Regarding first two "(1) convert model from PyTorch on the fly (2) convert model from TensorFlow on the fly ", are you going to use OVTF and anything handmade for...

[POT]: Simplified PTQ API (prototype)

cc'ed @alexsu52

[POT]: Simplified PTQ API (prototype)

@helena-intel, can you please take a look as well?

[POT]: Simplified PTQ API (prototype)

@helena-intel, thanks for the comments. This is a POC for now. BTW, I have some concerns regarding accuracy-aware quantization and I am going to revise its API soon. The main...

Out of memory for magnitude sparsity algo with stable diffusion model

@vshampor, @ljaljushkin can you please take a look?

Align NPU to CPU

Just for the record, the main motivation for keeping the config is QAT for NPU which has some custom features such as W4A4 support.

LLAMA_CPP notebook with Qwen-7B-Chat

LGTM, thanks.

Does it support the chatglm-6b model?Hope to support Chatglm 6b

@junior-zsy, @tsaizehua, take a look here: https://github.com/huggingface/optimum/pull/1479