[DRAFT]: Add FracBits experimental feature
Changes
Add a new mixed-precision QAT algorithm, FracBits [paper] and [code] as an experimental feature.
Reason for changes
To expand the choice of mixed-precision QAT algorithms for users.
Related tickets
87363
Tests
Implemented in tests/torch/experimental/fracbits.
@vinnamkim, thanks for your contribution!
Some general questions:
Hi @alexsu52,
- Did you compare the implemented algorithm with existing NNCF algorithms.
I just compared FracBits with NNCF 8bit QAT. You can see the results in README.md included in this PR. It shows that FracBits can compress the total bits of model weights (model size) 1.5x compared to NNCF 8bit QAT for 3 models (MobileNet-V2, Inception-V3, and ResNet-50) and 2 datasets (ImageNet and CIFAR100) under competitive degradation (<1%).
- What user scenario do you cover?
I think that it can be used by users who want to compress their model size more with the mixed-precision QAT. It doesn't require any time-consuming initialization phase or external exploration phase unlike HAWQ and AutoQ. However, it requires quantization forward-backward propagation steps twice than vanilla QAT.
I just compared FracBits with NNCF 8bit QAT. You can see the results in README.md included in this PR. It shows that FracBits can compress the total bits of model weights (model size) 1.5x compared to NNCF 8bit QAT for 3 models (MobileNet-V2, Inception-V3, and ResNet-50) and 2 datasets (ImageNet and CIFAR100) under competitive degradation (<1%).
It looks like it's not fair to compare with 8bit QAT. Have you had any comparison results (time/accuracy/compression rate/easy to use) with HAWQ and AutoQ?
I think that it can be used by users who want to compress their model size more with the mixed-precision QAT. It doesn't require any time-consuming initialization phase or external exploration phase unlike HAWQ and AutoQ. However, it requires quantization forward-backward propagation steps twice than vanilla QAT.
If I understand correctly, the user must to get a smaller model in comparison with INT8 model in the OpenVINO format. Does OpenVINO support your model? You reported the theoretical compression rate in README.md. What is the actual compression rate?