blog icon indicating copy to clipboard operation
blog copied to clipboard

`hf-bitsandbytes-integration.md` Incorrect Dequantization

Open HanGuo97 opened this issue 1 year ago • 3 comments

Hi,

In the bitsandbytes integration blog, it says one could retrieve the FP16 weights via

(int8_model[0].weight.CB * int8_model[0].weight.SCB) / 127

However, this is incorrect. In the case of the 2D matrix, it should have been (notice the explicit unsqueeze function):

(int8_model[0].weight.CB * int8_model[0].weight.SCB.unsqueeze(dim=1)) / 127

The reason the former code works is because the matrix happens to be a square matrix. However, the broadcasting dimension was silently incorrect. The same code would not run with a (non-square) rectangular matrix.

I don't know the inner workings of bitsandbytes enough --- could someone help me confirm what would be the proper way to de-quantize the matrix? Thanks in advance for the help!

HanGuo97 avatar May 20 '23 19:05 HanGuo97

cc @younesbelkada

pcuenca avatar May 21 '23 10:05 pcuenca

Great catch, thanks very much, you are totally right here For anyone that is curious, to repoduce quickly:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", device_map="auto", load_in_8bit=True)
weight_matrix = model.model.decoder.project_out.weight

print((weight_matrix.CB * weight_matrix.SCB.unsqueeze(dim=1)) /127)
print((weight_matrix.CB * weight_matrix.SCB) /127) # this fails

@HanGuo97 Would you mind opening a Pull Request to fix that? Otherwise happy to do it?

younesbelkada avatar May 22 '23 10:05 younesbelkada

Thanks for confirming this!

Unfortunately, I'm a bit swamped by an upcoming deadline, so I don't think I could create a PR in the short term :/

HanGuo97 avatar May 22 '23 13:05 HanGuo97