transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add quantization_config in AutoModelForCausalLM.from_config()

Open ishaansharma opened this issue 2 years ago • 10 comments

Feature request

Add quantization_config feature to AutoModelForCausalLM from config . I am trying to pretrain a model from scratch and use bits and bytes so that It can be trained on less computation expensive machines. Below is my quantization config :

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

When I attempted to take the config of certain model from_pretrained function it failed and raised a Type Error mentioned below.

from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_config(config,quantization_config=bnb_config, device_map={"":0})

The Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[23], line 7
      3 # Download configuration from huggingface.co and cache.
      5 configy = AutoConfig.from_pretrained("mistralai/Mistral-7B-v0.1")
----> 7 modely = AutoModelForCausalLM.from_config(configy,quantization_config=bnb_config, device_map={"":0})

File ~/miniconda3/envs/ai/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:441, in _BaseAutoModelClass.from_config(cls, config, **kwargs)
    439 elif type(config) in cls._model_mapping.keys():
    440     model_class = _get_model_class(config, cls._model_mapping)
--> 441     return model_class._from_config(config, **kwargs)
    443 raise ValueError(
    444     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    445     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    446 )

File ~/miniconda3/envs/ai/lib/python3.10/site-packages/transformers/modeling_utils.py:1192, in PreTrainedModel._from_config(cls, config, **kwargs)
   1190         model = cls(config, **kwargs)
   1191 else:
-> 1192     model = cls(config, **kwargs)
   1194 # restore default dtype if it was modified
   1195 if dtype_orig is not None:

TypeError: MistralForCausalLM.__init__() got an unexpected keyword argument 'quantization_config'

Motivation

I had tried a work around by saving the model from the loaded config details from the model and then load the same model with quantization config .

I believe this process could get fixed and we can enable/add quantization while loading the model from the config itself.

Your contribution

from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_config(config)
model.save_pretrained(MODEL_NAME_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME_PATH, quantization_config=bnb_config, device_map={"":0})

ishaansharma avatar Oct 18 '23 12:10 ishaansharma

WDYT @younesbelkada

ArthurZucker avatar Oct 19 '23 13:10 ArthurZucker

Hi @ishaansharma Thanks a lot for the proposal! I personally would not advocate to go for that route, the quantization schemes we support right now consists on post-trainign quantization, meaning the usecase is always

1- load pre-trained weights from the hub or locally 2- quantize the pre-trained weights

The API you propose is cool, but I am afraid will not be used in practice as from_config will load random weights to the model. Let me know if I misunderstood anything!

younesbelkada avatar Oct 23 '23 09:10 younesbelkada

Hi @ishaansharma Thanks a lot for the proposal! I personally would not advocate to go for that route, the quantization schemes we support right now consists on post-trainign quantization, meaning the usecase is always

1- load pre-trained weights from the hub or locally 2- quantize the pre-trained weights

The API you propose is cool, but I am afraid will not be used in practice as from_config will load random weights to the model. Let me know if I misunderstood anything!

  1. I wanted this feature because it will be very useful for pre-training from scratch from any large language model with huge parameters that usually cannot be done on small machines will very less computation cost .

  2. To pre-train any model from scratch and to build a language model on a totally new language , I don't think the loaded random weights from the config will cause any harm. as eventually weights will get updated with the training .

@younesbelkada , I just want that even the pre-training a model of any language from scratch using any LLM architecture can be done on any machine .

Let me know if this approach help .

Warm Regard.

ishaansharma avatar Oct 23 '23 10:10 ishaansharma

Thanks for getting back to me @ishaansharma !

I wanted this feature because it will be very useful for pre-training from scratch from any large language model with huge parameters that usually cannot be done on small machines will very less computation cost .

Since you cannot perform full fine-tuning when the model is quantized I think that this is technically not possible :/ This comment can also be applied on your thoughts here:

To pre-train any model from scratch and to build a language model on a totally new language , I don't think the loaded random weights from the config will cause any harm. as eventually weights will get updated with the training .

younesbelkada avatar Oct 23 '23 14:10 younesbelkada

I have a similar use case but I want to load huge models efficiently so I've been following this guide, which first loads the empty model from a config and then loads the state into the empty model. But I do not understand how we can add other parameters (like load_in_8bit) to this process - from_config does not support such kwargs and nor does load_checkpoint_and_dispatch. So is that simply not possible in this kind of workflow? How else would one efficiently and quickly load a model in 8 bit? @younesbelkada

BramVanroy avatar Mar 13 '24 07:03 BramVanroy

Hey I stumbled upon the same issue, would've liked to be able to supply a device_map to AutoModel.from_config. :)

janEbert avatar Oct 17 '24 19:10 janEbert

cc @SunMarc

LysandreJik avatar Oct 18 '24 09:10 LysandreJik

Hey @janEbert , what would be the use case for loading the model with from_config and device_map ? A workaround is to save the model loaded with from_config then use from_pretrained to load it again.

If you want to quantize the model loaded with from_config, please read the points that younes shared above. Thanks !

SunMarc avatar Oct 18 '24 14:10 SunMarc

The use case is to have the model properly distributed automatically. The workaround does work but is extremely hacky and ugly, if I'm completely honest. :sweat_smile: Cheers for the suggestion, though!

janEbert avatar Oct 18 '24 14:10 janEbert

The use case is to have the model properly distributed automatically

We recommend using device_map for inference but it might no be very useful on a model with random weights.

Nevertheless, the algorithm behind device_map requires us to have the loaded weights somewhere. When using from_config, we are initializing the weights from the model definition and not from a file that was stored on the hub. If you can load the entire model on the cpu, then what you can do is to use dispatch_model function to have the model distributed across your gpus.

from transformers import AutoConfig, AutoModelForCausalLM
from accelerate import dispatch_model, infer_auto_device_map

config = AutoConfig.from_pretrained("model")
model = AutoModelForCausalLM.from_config(config)

# infer device_map
device_map = infer_auto_device_map(model, no_split_module_classes = model._no_split_modules)

dispatch_model(model, device_map)

LMK if this works for you ! You can find more information on how device_map works here.

SunMarc avatar Oct 18 '24 14:10 SunMarc

Thanks a lot for the infer_auto_device_map and dispatch_model command! As you can tell, I would like to avoid loading the model on the CPU first so I'm not limited by RAM regarding the model size.

Sorry for not giving enough information in the first place. My use case is that I want to convert a model from custom code to HF "stdlib" code. The converted model is instantiated via from_config from a converted config and then I load the converted state dict into it. However, since device_map is not supported with from_config, I am limited by the CPU RAM. Even your really nice suggestions don't help in that case; not even the first one, since I'd still have to be able to instantiate the model on single-node CPU first. :/

janEbert avatar Oct 21 '24 08:10 janEbert

Even your really nice suggestions don't help in that case; not even the https://github.com/huggingface/transformers/issues/26901#issuecomment-2422621147, since I'd still have to be able to instantiate the model on single-node CPU first. :/

How big is the model ? The model should be sharded, so it should only take max_shard_size in term of memory. I think that in save_pretrained, we set the shard size to 5GB. Also, if the model is in safetensors format, we should be able to load the model directly to the gpu without passing by the cpu.

SunMarc avatar Oct 22 '24 14:10 SunMarc

Would this work in the multi-node setting as well? Because the model is too big to fit on one node. Sorry that wasn't clear.

janEbert avatar Oct 22 '24 18:10 janEbert

So this doesn't work on multi-node setting. However, we are working on making transformers models compatible with PP/TP methods from pytorch that works with multi-node !

SunMarc avatar Oct 23 '24 12:10 SunMarc

#34184 for the linked PR! 🤗

ArthurZucker avatar Oct 24 '24 15:10 ArthurZucker