bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

How to recover the float16 weight?

Open xiaoda99 opened this issue 3 years ago • 2 comments

Say I have a Linear8bitLt module with int8 weight on GPU, which is converted from a nn.Linear module with float16 weight. How could I restore the float16 weight, so that I could do some cusmtomized computation which is not supported by int8? The blog post A Gentle Introduction to 8-bit Matrix Multiplication mentioned: "You might also wonder how to retrieve the FP16 weights in order to perform the outlier MatMul in fp16? You can simply do: (int8_model[0].weight.CB * int8_model[0].weight.SCB) / 127" This method does not work me because both weight.CB and weight.SCB are None. I also tried: (int8_model[0].state.CxB * int8_model[0].state.SCB) / 127 but the result is not aligned with the original float16 weight.

FYI, the Linear8bitLt module here is from EleutherAI/gpt-neox-20b - Hugging Face: GPTNeoXForCausalLM.gpt_neox.layers[0].attention.dense

  • Da Xiao

xiaoda99 avatar Sep 21 '22 00:09 xiaoda99

I was on xiaoda99's team .

After reading the source code. We found that among the parameters in the layer, the value of weight is CxB, weight.state.formatB="col_ampere" So we expect to use the transform function again to convert CxB to a normal representation with the following transform parameters: from_order="col_ampere", to_order="row". But we encounter an error report

AttributeError: /.../miniconda3/envs/torch1.7/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes.so: undefined symbol: ctransform_ampere2row

It looks like ctransform_ampere2row is not implemented, when this feature will be released ?

moonscar avatar Sep 29 '22 06:09 moonscar

The transformerion from col_ampere/col_turing to row-major is not supported by NVIDIA and is also not supported by my library. I will not implement it since it is a very complicated function that would take weeks of work.

However, an alternative for recovering the Int8 weight is to store the row-major int8 tensor and convert it to colAmpere/Turning only when needed.

Alternatively, you can store in row-major int8 and use fp16 compute — in other words, Int8 storage, fp16 compute. Support for this was added recently (autograd, module)

TimDettmers avatar Oct 10 '22 01:10 TimDettmers