alignment-handbook
alignment-handbook copied to clipboard
Error when fine-tuning using the SFT with QLORA
Hello,
I tried to fine-tune a model using the SFT/QLoRA method provided in the handbook. Everything runs until the beginning of the training phase. At this moment, the following error occurs:
#Error#
Traceback (most recent call last):
File "/home/michelet/my_projects/fine_tuning/alignment-handbook-main/scripts/run_sft.py", line 233, in
#More details# To provide more details, I'm using the following command: ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_qlora.yaml --load_in_4bit=true (Note that I left the recipe untouched, except for the percentage of samples taken from the dataset to speed up the issue reproduction). I also tried to reinstall a new venv several times and on different machines.
#What I did in the meantime# It looks like the self.to_dict() call serving as a parameter for the json.dumps() call returns a dictionary of training arguments (2471 training_args.py). One of these arguments is a dictionary itself, that contains a BitsAndBytesConfig object as one of its values. This object is not serializable, but the BitsAndBytesConfig class provides a function named to_dict() that can transform a BitsAndBytesConfig object into a dictionary. I tried to modify the to_dict() function of the TrainingArguments class (2444 training_args.py). I know this should not be done but it kind of solved the problem. By retrieving this BitsAndBytesConfig object, changing it to its dictionary form (by using the to_dict() function of BiysandBytesConfig objects), and replacing the object with the dictionary form in the arguments dictionary, the error is not triggered anymore, and the BitAndBytesConfig appears in the log file. I don't know if this is a logging matter only or if this modification might have screwed up the fine-tuning process (from what I could understand the result of this json.dumps() is added to a SummaryWriter which is used for logging purposes. I'm not sure though).
I don't know if this is related to my installation or my use of the handbook/recipes. Did anyone run into the same error? Or can someone reproduce it?
Have a nice day!
This could be solved when you convert quantization_config in sft script to json with to_json() method.
I realized that this is the issue that transformers does not convert all nested config to JSON recursively.
Thanks for your quick answer!
Should I do this conversion in the run_sft.py file from the alignment handbook? Something like <quantization_config=quantization_config.to_json()> on line 120?
Edit: Just tried and got an error telling that BitsAndBytes objects do not have a to_json() method
Ah sorry it's to_dict
https://github.com/huggingface/transformers/blob/e0d82534cc95b582ab072c1bbc060852ba7f9d51/src/transformers/utils/quantization_config.py#L129
Changing line 120 of alignment_handbook/scripts/run_sft.py from quantization_config=quantization_config to quantization_config=quantization_config.to_dict() worked!
Thank you for your help!
Should I close the issue now?