blog
blog copied to clipboard
`hf-bitsandbytes-integration.md` Incorrect Dequantization
Hi,
In the bitsandbytes
integration blog, it says one could retrieve the FP16 weights via
(int8_model[0].weight.CB * int8_model[0].weight.SCB) / 127
However, this is incorrect. In the case of the 2D matrix, it should have been (notice the explicit unsqueeze
function):
(int8_model[0].weight.CB * int8_model[0].weight.SCB.unsqueeze(dim=1)) / 127
The reason the former code works is because the matrix happens to be a square matrix. However, the broadcasting dimension was silently incorrect. The same code would not run with a (non-square) rectangular matrix.
I don't know the inner workings of bitsandbytes
enough --- could someone help me confirm what would be the proper way to de-quantize the matrix? Thanks in advance for the help!
cc @younesbelkada
Great catch, thanks very much, you are totally right here For anyone that is curious, to repoduce quickly:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m", device_map="auto", load_in_8bit=True)
weight_matrix = model.model.decoder.project_out.weight
print((weight_matrix.CB * weight_matrix.SCB.unsqueeze(dim=1)) /127)
print((weight_matrix.CB * weight_matrix.SCB) /127) # this fails
@HanGuo97 Would you mind opening a Pull Request to fix that? Otherwise happy to do it?
Thanks for confirming this!
Unfortunately, I'm a bit swamped by an upcoming deadline, so I don't think I could create a PR in the short term :/