haq
haq copied to clipboard
Regarding paper and codes
By diving deep into the codes and the paper, I have two questions.
-
I've read from the paper that "If the current policy exceeds our resource budget (on latency, energy or model size), we will sequentially decrease the bitwidth of each layer until the constraint is finally satisfied." Where in the codes correspond to this statement "decrease the bitwidth of the layer when the current policy exceeds budget?"
-
Why don't you use the k-means quantization for latency/energy constraint experiments? Will you release codes for linear quantization?
Hi, I also find the second question. And Did you reappear the quantization method? I reappear the quantization method based on cifar10+resner20 as 3.4 of the paper; however, this linear quantization method didn't work.
I find that the codes using k-means quantization while in the paper it says find the optimal clip value to minimize the KL divergence between non-quantized and quantized weight/activation, in the paper it means the linear quantization, which is different as shown in the codes.
I find that the codes using k-means quantization while in the paper it says find the optimal clip value to minimize the KL divergence between non-quantized and quantized weight/activation, in the paper it means the linear quantization, which is different as shown in the codes.
This confuses me as well. The paper uses linear quantization, but the code provides k-means quantization (similar to the "deep compression"). After k-means quantization, we cannot guarantee that the weights are fixed point arithmetic units.
It's quite unfortunate that the main novelty claimed by the paper, i.e., the use of direct hardware feedback, is conveniently missing in this repo. In fact, even the paper failed to provide a clear explanation on that claim.
We have updated the linear quantization as well as the hardware resource-constrained part in this repo. Please let us know if you have any questions.
Can you please point to the part where the direct HW feedback is used? Thanks. Without that, the repo is still quite limited in significance.
Thanks for your feedback! You can view the related code refer to https://github.com/mit-han-lab/haq/blob/7141586e9ae47c8a50aa8b596ab37682a06b434a/lib/env/linear_quantize_env.py#L306