bitsandbytes
bitsandbytes copied to clipboard
How to recover the float16 weight?
Say I have a Linear8bitLt module with int8 weight on GPU, which is converted from a nn.Linear module with float16 weight. How could I restore the float16 weight, so that I could do some cusmtomized computation which is not supported by int8? The blog post A Gentle Introduction to 8-bit Matrix Multiplication mentioned: "You might also wonder how to retrieve the FP16 weights in order to perform the outlier MatMul in fp16? You can simply do: (int8_model[0].weight.CB * int8_model[0].weight.SCB) / 127" This method does not work me because both weight.CB and weight.SCB are None. I also tried: (int8_model[0].state.CxB * int8_model[0].state.SCB) / 127 but the result is not aligned with the original float16 weight.
FYI, the Linear8bitLt module here is from EleutherAI/gpt-neox-20b - Hugging Face: GPTNeoXForCausalLM.gpt_neox.layers[0].attention.dense
- Da Xiao
I was on xiaoda99's team .
After reading the source code. We found that among the parameters in the layer, the value of weight is CxB, weight.state.formatB="col_ampere" So we expect to use the transform function again to convert CxB to a normal representation with the following transform parameters: from_order="col_ampere", to_order="row". But we encounter an error report
AttributeError: /.../miniconda3/envs/torch1.7/lib/python3.7/site-packages/bitsandbytes/libbitsandbytes.so: undefined symbol: ctransform_ampere2row
It looks like ctransform_ampere2row is not implemented, when this feature will be released ?
The transformerion from col_ampere/col_turing to row-major is not supported by NVIDIA and is also not supported by my library. I will not implement it since it is a very complicated function that would take weeks of work.
However, an alternative for recovering the Int8 weight is to store the row-major int8 tensor and convert it to colAmpere/Turning only when needed.
Alternatively, you can store in row-major int8 and use fp16 compute — in other words, Int8 storage, fp16 compute. Support for this was added recently (autograd, module)