Align vocab_size to 32-multiples to Prevent Shape Mismatch Errors on Ampere GPUs with bnb Quantization

Open hjh0119 opened this issue 1 year ago • 0 comments

As mentioned in https://github.com/OpenGVLab/InternVL/issues/129

It is recommended to set the tokenizer's vocab_size to be a multiple of 32 (and consequently adjust the dimensions of the embedding and the final lm_head, i.e., the language_model.output accordingly). Otherwise, after quantization with bitsandbytes (bnb), the model may encounter errors when computing gradients (backward) on Ampere GPUs with a version greater than 8. This is due to bnb manually padding the shape to the nearest multiple of 32, leading to shape mismatches.

https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L508-L512

May 14 '24 12:05 hjh0119