h2ogpt icon indicating copy to clipboard operation
h2ogpt copied to clipboard

Issues with Upgraded Pip Libs for LoRA Weights in 4-bit and 8-bit Training

Open orellavie1212 opened this issue 1 year ago • 9 comments

Dear Support Team,

I recently upgraded my pip libraries, including transformers, peft, accelerate, and bitsandbytes, to support 4-bit training as opposed to the original 8-bit training. After doing so, I successfully completed the finetune process, but discovered that loading the LoRA weights had no effect on the vanilla model.

Upon further investigation and retraining the 8-bit model using the older version, I attempted the same 8-bit training with the new versions. However, I only obtained the vanilla 8-bit model answers, which were not comparable to those from the older 8-bit version. Despite checking for differences in peft_model.py and the save_and_load function specifically for adapters, I didn't find any useful information. This leads me to believe that the bug may be located elsewhere, and I am not able to pinpoint its exact location.

Unfortunately, I cannot compare the two 4-bit versions directly since the older version does not support 4-bit training.

My suspicion is that there may be a low-level API issue or another aspect preventing the model weights from loading properly in the newer versions. Alternatively, it is possible that the adapter LoRA weights are not training at all during the process.

According to some guy around the net, we need to use .cuda after pretrain at finetune.py, and load weights LoRA after quantize (.cuda) at generate.py, if that helps at all.

I would appreciate any assistance or insights you could provide to help resolve these issues.

The newer versions of libraries - from requirements_optional_4bit.txt bitsandbytes==0.39.0 transformers @ git+https://github.com/huggingface/transformers.git@17a55534f5e5df10ac4804d4270bf6b8cc24998d accelerate @ git+https://github.com/huggingface/accelerate.git@7d24bdefb5b3252505151d8c1ac0efbed3574857 peft @ git+https://github.com/huggingface/peft.git@3714aa2fff158fdfa637b2b65952580801d890b2

The older versions of libraries ( which actually works well!, but no 4bit qlora) - from requirements.txt bitsandbytes==0.38.1 transformers==4.28.1 accelerate==0.18.0 peft @ git+https://github.com/huggingface/peft.git@098962fa6515f2e4fe83a757f5995d3ffbb1c373

opened another bug in Peft - https://github.com/huggingface/peft/issues/512 in case the bug does not concerns h2ogpt at all.

orellavie1212 avatar May 28 '23 15:05 orellavie1212

Thanks. @arnocandel might have some insights as he did training in 4-bit recently, while I've not done that yet.

We have had some issue in past where the lora state was only saved as checkpoints, and the final model saved was essentially some kind of empty shell. I had to copy the checkpoint model as the adapter model and use that instead. I'm unsure if that was ever resolved/understood, @arnocandel ?

pseudotensor avatar May 29 '23 07:05 pseudotensor

yes, that is the problem! I found out the dict is empty.. when I am checking the adapter_weights. how did you fix it specifically? I'll mock. at the checkpoints there is a difference between adapter.bin to torch.bin in checkpoint, so I am not sure how to move the checpoint to the final adapater.bin. Thanks @pseudotensor

orellavie1212 avatar May 29 '23 07:05 orellavie1212

@orellavie1212 I just ensured there were checkpoints every so often. Then I copy over the last checkpoint torch model to adapter_model.bin.

E.g.:

jon@pseudotensor:/data/jon/snap_llama3/llama-30b-hf.h2oaiopenassistant_oasst1_h2ogpt.8.0_epochs.31eef248d53c9f39e51c60b8b030c1e3cafc34b0.llama30b_7$ ls -alrt
total 798944
drwx------  3 jon jon      4096 Apr 26 22:28 runs/
drwx------  2 jon jon      4096 Apr 26 23:29 checkpoint-6000/
drwx------  2 jon jon      4096 Apr 27 00:29 checkpoint-12000/
drwx------  2 jon jon      4096 Apr 27 01:30 checkpoint-18000/
drwx------  2 jon jon      4096 Apr 27 02:31 checkpoint-24000/
drwx------  2 jon jon      4096 Apr 27 03:32 checkpoint-30000/
drwx------  2 jon jon      4096 Apr 27 04:33 checkpoint-36000/
drwx------  2 jon jon      4096 Apr 27 05:34 checkpoint-42000/
drwx------  2 jon jon      4096 Apr 27 06:35 checkpoint-48000/
-rw-------  1 jon jon       380 Apr 27 06:38 adapter_config.json
drwx------ 11 jon jon      4096 Apr 27 06:38 ./
drwx------ 18 jon jon      4096 Apr 27 22:10 ../
-rw-------  1 jon jon 818063245 Apr 28 00:00 adapter_model.bin
jon@pseudotensor:/data/jon/snap_llama3/llama-30b-hf.h2oaiopenassistant_oasst1_h2ogpt.8.0_epochs.31eef248d53c9f39e51c60b8b030c1e3cafc34b0.llama30b_7$ 
jon@pseudotensor:/data/jon/snap_llama3/llama-30b-hf.h2oaiopenassistant_oasst1_h2ogpt.8.0_epochs.31eef248d53c9f39e51c60b8b030c1e3cafc34b0.llama30b_7$ ls -alrt checkpoint-48000/
total 2403104
-rw-------  1 jon jon      14583 Apr 27 06:35 rng_state_7.pth
-rw-------  1 jon jon      14583 Apr 27 06:35 rng_state_6.pth
-rw-------  1 jon jon      14583 Apr 27 06:35 rng_state_5.pth
-rw-------  1 jon jon      14583 Apr 27 06:35 rng_state_4.pth
-rw-------  1 jon jon      14583 Apr 27 06:35 rng_state_3.pth
-rw-------  1 jon jon      14583 Apr 27 06:35 rng_state_2.pth
-rw-------  1 jon jon      14583 Apr 27 06:35 rng_state_1.pth
-rw-------  1 jon jon       3899 Apr 27 06:35 training_args.bin
-rw-------  1 jon jon     499723 Apr 27 06:35 tokenizer.model
-rw-------  1 jon jon        715 Apr 27 06:35 tokenizer_config.json
-rw-------  1 jon jon        423 Apr 27 06:35 special_tokens_map.json
-rw-------  1 jon jon  818063245 Apr 27 06:35 pytorch_model.bin
-rw-------  1 jon jon        627 Apr 27 06:35 scheduler.pt
-rw-------  1 jon jon        557 Apr 27 06:35 scaler.pt
-rw-------  1 jon jon 1636183613 Apr 27 06:35 optimizer.pt
-rw-------  1 jon jon    5855915 Apr 27 06:35 trainer_state.json
-rw-------  1 jon jon      14583 Apr 27 06:35 rng_state_0.pth
drwx------  2 jon jon       4096 Apr 27 06:35 ./
drwx------ 11 jon jon       4096 Apr 27 06:38 ../
jon@pseudotensor:/data/jon/snap_llama3/llama-30b-hf.h2oaiopenassistant_oasst1_h2ogpt.8.0_epochs.31eef248d53c9f39e51c60b8b030c1e3cafc34b0.llama30b_7$ 

So you can see the adapter_model.bin was copied over later from checkpoint-48000/pytorch_model.bin.

pseudotensor avatar May 29 '23 07:05 pseudotensor

On the main directory I have dapter_config.json adapter_model.bin checkpoint-69 checkpoint-72 checkpoint-75 runs On the specific directory (checkpoint-75) optimizer.pt pytorch_model.bin rng_state.pth scaler.pt scheduler.pt special_tokens_map.json tokenizer_config.json tokenizer.json trainer_state.json training_args.bin

you only need to change the pytorch_model.bin into adapter_model.bin at the main directory? they are actually the same (last checkpoint of course)?

orellavie1212 avatar May 29 '23 07:05 orellavie1212

It's not super clean, but you can get idea of what is required for LORA from https://huggingface.co/h2oai/h2ogpt-research-oig-oasst1-512-30b-lora/tree/main for same 30b.

Need adapter_config.json, adapter_model.bin, tokenizer.model, and tokenizer_config.json in some path used below.

I think with those alone, you can do:

python generate.py --base_model=<base model HF name or local path> --lora_weights=<path to those lora files>

And that should work.

Similar with finetune.py, one can pass lora_weights in and continue tuning.

pseudotensor avatar May 29 '23 07:05 pseudotensor

E.g. gpt2.h2o.ai and my own computer were for a while running 30B as lora only, with files like:

total 799408
-rw-rw-r--  1 jon jon    499723 Apr 28 22:30 tokenizer.model
-rw-rw-r--  1 jon jon       715 Apr 28 22:30 tokenizer_config.json
-rw-rw-r--  1 jon jon       423 Apr 28 22:30 special_tokens_map.json
-rw-------  1 jon jon 818063245 Apr 28 22:32 adapter_model.bin
-rw-------  1 jon jon       380 Apr 28 22:33 adapter_config.json
drwx------  2 jon jon      4096 May  6 01:58 ./
drwx------ 85 jon jon      4096 May 11 22:06 ../
jon@pseudotensor:/data/jon/llama-30b-hf.h2oaih2ogpt-oig-oasst1-instruct-cleaned-v2.2.0_epochs.131f6d098b43236b5f91e76fc074ad089d6df368.llama30b_17$ 

and

GPT_H2O_AI=1 SAVE_DIR=./save/ CONCURRENCY_COUNT=1 python generate.py --base_model=decapoda-research/llama-30b-hf --lora_weights=/data/jon/llama-30b-hf.h2oaih2ogpt-oig-oasst1-instruct-cleaned-v2.2.0_epochs.131f6d098b43236b5f91e76fc074ad089d6df368.llama30b_17 --height=800 --use_auth_token=True --infer_devices=False --share=True --prompt_type=human_bot --langchain_mode='wiki_full' --visible_langchain_modes="['wiki_full', 'UserData', 'MyData', 'github h2oGPT', 'DriverlessAI docs']"

You probably won't do wiki_full but you get the idea.

pseudotensor avatar May 29 '23 08:05 pseudotensor

I have adapter_model.bin but it is empty as I said. I tried to understand what you offered, but the only possible I see is to take pytorch_model.bin from the checkpoint folder, and change its name to adapter_model.bin and hope it will work. The currently adapter_model.bin is empty as I loaded it with torch.load. You said that you successfully loaded a checkpoint, but GPT_H2O_AI=1 SAVE_DIR=./save/ CONCURRENCY_COUNT=1 python generate.py --base_model=decapoda-research/llama-30b-hf --lora_weights=/data/jon/llama-30b-hf.h2oaih2ogpt-oig-oasst1-instruct-cleaned-v2.2.0_epochs.131f6d098b43236b5f91e76fc074ad089d6df368.llama30b_17 --height=800 --use_auth_token=True --infer_devices=False --share=True --prompt_type=human_bot --langchain_mode='wiki_full' --visible_langchain_modes="['wiki_full', 'UserData', 'MyData', 'github h2oGPT', 'DriverlessAI docs']" lora weights flag is looking for /data/jon/llama-30b-hf.h2oaih2ogpt-oig-oasst1-instruct-cleaned-v2.2.0_epochs.131f6d098b43236b5f91e76fc074ad089d6df368.llama30b_17, which contains the adapter_model.bin, which is empty now... The only .bin file in the checkpoint folder (75) could be torch_model.bin, which I could rename it and move to the main folder (adapter_model.bin)

orellavie1212 avatar May 29 '23 08:05 orellavie1212

@orellavie1212 Yes, what I mentioned is you should copy the checkpoint pytorch_model.bin to overwrite the bad adapter_model.bin

pseudotensor avatar May 29 '23 08:05 pseudotensor

yes, the weights actually loaded successful now. Any idea if the last checkpoint is exactly the last training? or I missed some of the epoch, so adapter_bin.model is advanced in some sense

orellavie1212 avatar May 29 '23 08:05 orellavie1212

This was fixed in above PR

pseudotensor avatar Jun 23 '23 05:06 pseudotensor