InternVL
InternVL copied to clipboard
Align vocab_size to 32-multiples to Prevent Shape Mismatch Errors on Ampere GPUs with bnb Quantization
As mentioned in https://github.com/OpenGVLab/InternVL/issues/129
It is recommended to set the tokenizer's vocab_size to be a multiple of 32 (and consequently adjust the dimensions of the embedding and the final lm_head, i.e., the language_model.output accordingly). Otherwise, after quantization with bitsandbytes (bnb), the model may encounter errors when computing gradients (backward) on Ampere GPUs with a version greater than 8. This is due to bnb manually padding the shape to the nearest multiple of 32, leading to shape mismatches.
https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L508-L512