qlora
qlora copied to clipboard
How to save a quantized model
I am having the following issue when pushing the trained 4-bit to huggingface through base_model.push_to_hub("my-awesome-model")
:
NotImplementedError: You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported
Is there any alternative to save the trained quantized model?
Is that what you are looking for? I know how to save it locally, but Idk how to solve this push to hub problem.
`trainer = transformers.Trainer(
model=model,
train_dataset=data["train"],
args=transformers.TrainingArguments(
per_device_train_batch_size=2,
num_train_epochs=5,
gradient_accumulation_steps=4,
warmup_steps=2,
max_steps=10,
learning_rate=2e-4,
fp16=True,
logging_steps=1,
output_dir="PATH_TO_SAFE_MODEL",
optim="paged_adamw_8bit"
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
trainer.train()`
Alternative code
trainer.save_model("gpt-neo-x-20b-finetuned-4bit")
trainer.mode.save_config("gpt-neo-x-20b-finetuned-4bit/config.json")
Thanks, I guess saving locally also works although I was hoping for a way to push to huggingface which is easier to use later for downstream tasks.
Hi, I have added
trainer.save_model("gpt-neo-x-20b-finetuned-4bit")
trainer.mode.save_config("gpt-neo-x-20b-finetuned-4bit/config.json")
at bottom of https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing#scrollTo=kdCccwcOQiCP
But it gives this error.
The same thing happening with my local env. Am I doing anything wrong?
Is that what you are looking for? I know how to save it locally, but Idk how to solve this push to hub problem.
`trainer = transformers.Trainer( model=model, train_dataset=data["train"], args=transformers.TrainingArguments( per_device_train_batch_size=2, num_train_epochs=5, gradient_accumulation_steps=4, warmup_steps=2, max_steps=10, learning_rate=2e-4, fp16=True, logging_steps=1, output_dir="PATH_TO_SAFE_MODEL", optim="paged_adamw_8bit" ), data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False), ) model.config.use_cache = False # silence the warnings. Please re-enable for inference! trainer.train()`
Alternative code
trainer.save_model("gpt-neo-x-20b-finetuned-4bit") trainer.mode.save_config("gpt-neo-x-20b-finetuned-4bit/config.json")
would this save just fine-tuned Lora in 4 bit, or the merged 4bit whole model? I am looking for a solution to save whole model in 4 bits after fine-tuning on 4bit base...
After saving the model and config, how do I load it again in Colab?
when I did
model.save_pretrained("my-awesome-model")
tokenizer.save_pretrained("my-awesome-model_tokenizer")
to save my model trained from andreaskoepf/pythia-1.4b-gpt4all-pretrain
instead, here are the files I got:
is it normal that the saved model and tokenizer are so small?
yes, this just saves the set of adapters. you'll have to merge it after loading the model again.
yes, this just saves the set of adapters. you'll have to merge it after loading the model again.
may I ask how this differs from how we usually load a model? Currently, I am just using something like
model = AutoModelForCausalLMWithValueHead.from_pretrained(
'saved_checkpoint_path',
load_in_8bit=True,
peft_config=qlora_config,
)
quantization_config = BitsAndBytesConfig( load_in_4bit=False, load_in_8bit=False, bnb_4bit_use_double_quant=False, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 )
model = AutoModelForCausalLM.from_pretrained('model-id', low_cpu_mem_usage = True, load_in_4bit = False, return_dict=True, quantization_config=quantization_config, torch_dtype =torch.float16, device_map='auto')
model = PeftModel.from_pretrained(model, adapter_path, offload_folder="/content/sample_data") model = model.merge_and_unload() torch.save(model.state_dict())
Now convert to ggml using llama.cpp and quantize
It's funny you could train a 7B LLM model in google colab but can't save it with the original weights.
Loading in 4 bit and saving with normal HuggingFace Trainer does not work as of now.
model.train()
# Set training parameters
training_arguments = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_train_epochs,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=1,# cannot save #or via trainer.save_model()
save_total_limit= 1,
logging_steps=logging_steps,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=False, # NotImplementedError: You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
#group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type,
report_to="tensorboard",
)
# Create a Trainer instance
trainer = Trainer(
model=model,
data_collator=data_collator,
train_dataset=train_dataset,
args=training_arguments,
)
trainer.train()
Error
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
[<ipython-input-15-252f3821ece6>](https://localhost:8080/#) in <cell line: 2>()
1 # Train model
----> 2 trainer.train()
6 frames
[/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in save_pretrained(self, save_directory, is_main_process, state_dict, save_function, push_to_hub, max_shard_size, safe_serialization, variant, **kwargs)
1713
1714 if getattr(self, "is_loaded_in_4bit", False):
-> 1715 raise NotImplementedError(
1716 "You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported"
1717 )
This checks with https://github.com/TimDettmers/bitsandbytes/issues/695 the issue that is open as of now
However, with SFTTTrainer I am able to save the adapter weights and config with a model loaded in 4 bits
trainer = SFTTrainer(
model=model,
train_dataset=train_dataset['train'],
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)
This issue can probably be closed.
Here is the issue to follow on this: https://github.com/TimDettmers/bitsandbytes/pull/753
@RonanKMcGovern
Thanks, it works out well.
Currently, there is no update in PYPI, others should use this command.
pip install git+https://github.com/huggingface/transformers.git