LLaVA How to fine-tune the LLaVA-7b model ?

Question

Hi, thanks on your great work!

I use the following command to fine-tune the LLaVA-7b model.

$PYTHON --nnodes=1 --nproc_per_node=8 --master_port=25001 \ llava/train/train_mem.py \ --model_name_or_path LLaMA-7b-convert \ --data_path $data_path \ --image_folder $image_folder \ --vision_tower $vision_tower \ --pretrain_mm_mlp_adapter LLaVA-7b-pretrain-projector-v0-CC3M-595K-original_caption.bin \ --mm_vision_select_layer -2 \ --mm_use_im_start_end True \ --bf16 True \ --output_dir ./checkpoints/llava-7B_new \ --num_train_epochs 5 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 5 \ --save_total_limit 3 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --fsdp "full_shard auto_wrap" \ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True \ --report_to wandb

But three weights are obtained, when your LLaVA-7b weights number is two. And I get error when I load these fine-tuned weights. How to fine-tune the LLaVA-7b ? Thanks so much!

OSError: Unable to load weights from pytorch checkpoint file for 'LLaVA-main/checkpoints/llava-7B_new/checkpoint-5/pytorch_model-00003-of-00003.bin' at 'LLaVA-main/checkpoints/llava-7B_new/checkpoint-5/pytorch_model-00003-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I found that the third model was not saved completely. When saving, it was OOM, but the training did not stop.. Thanks.

May 10 '23 07:05 yunh-w

Me too! After I finetune 7B, the model I got is three bin files, but what you release is two bin files. The files I get from finetune are all very large, and the total_size in "pytorch_model.bin.index.json" is 26970595328, while what you release is only 13485301760.

May 10 '23 07:05 Chen-Song

Hi @Chen-Song, you may notice that the size of your trained model is roughly 2x the size of the released checkpoints. This is because transformers saves the model weights with float32. When I release the weights, I convert them to float16 to save storage space / bandwidth.

@yunh-w Can you share the size of your trained model weights with ls -lt like @Chen-Song does? Thanks.

May 10 '23 14:05 haotian-liu

@haotian-liu What is the process to convert float32 to float16? I have a 13B fine-tuned model that is 50G.

May 16 '23 15:05 codybum

@codybum You can use this script for compressing the model. Please make sure to set two different paths for the model instead of overwriting the fp32 model and only delete the fp32 source model after verifying the model is working properly. Thanks.

May 16 '23 16:05 haotian-liu

How can we fine tune it on custom data, and what's the format of dataset to feed-in ?

Oct 13 '23 13:10 anonymous-atom

@anonymous-atom Here is an example dataset: https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/detail_23k.json

You just need to take your data and make it conform to this set. You can then use the build scripts, substituting your datasets as the training set.

Oct 13 '23 15:10 codybum

@yunh-w Hi, what hardware did you use?

Jan 04 '24 20:01 ali7919

LLaVA LLaVA copied to clipboard

How to fine-tune the LLaVA-7b model ?

Question

LLaVA
LLaVA copied to clipboard