qlora How to save a quantized model

I am having the following issue when pushing the trained 4-bit to huggingface through base_model.push_to_hub("my-awesome-model"):

NotImplementedError: You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported

Is there any alternative to save the trained quantized model?

Jun 02 '23 05:06 zyzhang1130

Is that what you are looking for? I know how to save it locally, but Idk how to solve this push to hub problem.

`trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2,
        num_train_epochs=5,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=10,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="PATH_TO_SAFE_MODEL",
        optim="paged_adamw_8bit"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()`

Alternative code

trainer.save_model("gpt-neo-x-20b-finetuned-4bit")
trainer.mode.save_config("gpt-neo-x-20b-finetuned-4bit/config.json")

Jun 02 '23 08:06 szymonrucinski

Thanks, I guess saving locally also works although I was hoping for a way to push to huggingface which is easier to use later for downstream tasks.

Jun 02 '23 08:06 zyzhang1130

Hi, I have added

trainer.save_model("gpt-neo-x-20b-finetuned-4bit")
trainer.mode.save_config("gpt-neo-x-20b-finetuned-4bit/config.json")

at bottom of https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing#scrollTo=kdCccwcOQiCP

But it gives this error.

The same thing happening with my local env. Am I doing anything wrong?

Is that what you are looking for? I know how to save it locally, but Idk how to solve this push to hub problem.

`trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2,
        num_train_epochs=5,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=10,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="PATH_TO_SAFE_MODEL",
        optim="paged_adamw_8bit"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()`

Alternative code

trainer.save_model("gpt-neo-x-20b-finetuned-4bit")
trainer.mode.save_config("gpt-neo-x-20b-finetuned-4bit/config.json")

Jun 06 '23 20:06 bubundas17

would this save just fine-tuned Lora in 4 bit, or the merged 4bit whole model? I am looking for a solution to save whole model in 4 bits after fine-tuning on 4bit base...

Jun 07 '23 10:06 shawei3000

After saving the model and config, how do I load it again in Colab?

Jun 14 '23 08:06 Preetika764

when I did

model.save_pretrained("my-awesome-model")
tokenizer.save_pretrained("my-awesome-model_tokenizer")

to save my model trained from andreaskoepf/pythia-1.4b-gpt4all-pretrain instead, here are the files I got: is it normal that the saved model and tokenizer are so small?

Jun 14 '23 14:06 zyzhang1130

yes, this just saves the set of adapters. you'll have to merge it after loading the model again.

Jun 18 '23 07:06 Preetika764

yes, this just saves the set of adapters. you'll have to merge it after loading the model again.

may I ask how this differs from how we usually load a model? Currently, I am just using something like

  model = AutoModelForCausalLMWithValueHead.from_pretrained(
    'saved_checkpoint_path',
    load_in_8bit=True,
    peft_config=qlora_config,

)

Jun 18 '23 08:06 zyzhang1130

quantization_config = BitsAndBytesConfig( load_in_4bit=False, load_in_8bit=False, bnb_4bit_use_double_quant=False, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 )

model = AutoModelForCausalLM.from_pretrained('model-id', low_cpu_mem_usage = True, load_in_4bit = False, return_dict=True, quantization_config=quantization_config, torch_dtype =torch.float16, device_map='auto')

model = PeftModel.from_pretrained(model, adapter_path, offload_folder="/content/sample_data") model = model.merge_and_unload() torch.save(model.state_dict())

Now convert to ggml using llama.cpp and quantize

Jun 18 '23 08:06 Preetika764

It's funny you could train a 7B LLM model in google colab but can't save it with the original weights.

Aug 18 '23 16:08 shaktiman101

Loading in 4 bit and saving with normal HuggingFace Trainer does not work as of now.

model.train()
# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=1,# cannot save #or  via  trainer.save_model()
    save_total_limit= 1,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=False, # NotImplementedError: You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    #group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard",
    

)

# Create a Trainer instance
trainer = Trainer(
  model=model,
  data_collator=data_collator,
  train_dataset=train_dataset,
  args=training_arguments,
)

trainer.train()

Error

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
[<ipython-input-15-252f3821ece6>](https://localhost:8080/#) in <cell line: 2>()
      1 # Train model
----> 2 trainer.train()

6 frames
[/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, **kwargs)
   1713 
   1714         if getattr(self, "is_loaded_in_4bit", False):
-> 1715             raise NotImplementedError(
   1716                 "You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported"
   1717             )

This checks with https://github.com/TimDettmers/bitsandbytes/issues/695 the issue that is open as of now

However, with SFTTTrainer I am able to save the adapter weights and config with a model loaded in 4 bits

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset['train'],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

Aug 21 '23 05:08 alexcpn

This issue can probably be closed.

Here is the issue to follow on this: https://github.com/TimDettmers/bitsandbytes/pull/753

Jan 08 '24 12:01 RonanKMcGovern

@RonanKMcGovern

Thanks, it works out well.

Currently, there is no update in PYPI, others should use this command. pip install git+https://github.com/huggingface/transformers.git

Feb 13 '24 09:02 ahyunsoo3

qlora qlora copied to clipboard

How to save a quantized model

qlora
qlora copied to clipboard