LLaVA
LLaVA copied to clipboard
Anyres compatible fine-tuning of llava-1.6 mistral 7b and 34b
Lowrank fine-tuning with anyres for the LLaVA Next models :)
that is a good pr.. i am finetune this pr ..thanks .
@arielnlee
Ofc, glad you found it useful! I'm sure the author's version is far superior (<3 llava), but wanted to leave this here for others to use until we get the real magic :)
@awzhgw
@arielnlee I encountered an issue during the training process. I am using the LoRA fine-tuning method, and my data consists of two parts:
lots of Pure question-answering dialogues. Image-question-answering dialogues.
During training, I found that the training speed for the first part of the dataset is very slow, and it is as slow as the first part. After investigation, I found that the reason is:
在train.py的LazySupervisedDataset 类的__getitem__ 方法:
if 'image' in self.list_data_dict[i]:
data_dict['image'] = image
data_dict['image_size'] = image_size
elif self.data_args.is_multimodal:
# image does not exist in the data, but the model is multimodal
crop_size = self.data_args.image_processor.crop_size
data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width'])
data_dict['image_size'] = crop_size
return data_dict
when i delete this code 👍
elif self.data_args.is_multimodal:
# image does not exist in the data, but the model is multimodal
crop_size = self.data_args.image_processor.crop_size
data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width'])
data_dict['image_size'] = crop_size
return data_dict
the train process error:
Traceback (most recent call last):
File "/export/App/training_platform/PinoModel/LLaVA/llava/train/train_mem.py", line 9, in <module>
train(attn_implementation="flash_attention_2")
File "/export/App/training_platform/PinoModel/LLaVA/llava/train/train.py", line 1092, in train
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2744, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1958, in backward
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1964, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 2152, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
how to fine-tuning for the pure text stage with very fast speed?
Can I do this? ? ? Finally trained a good llava model
i got adapter_model.safetensor instead of adapter_model.bin after LORA FInetuning of 1.6-mistral and getting error as
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.30it/s]
Traceback (most recent call last):
File "/home/rohith/LLaVA-1.6-ft/scripts/merge_lora_weights.py", line 22, in
when trying to merge model
@rohithbojja may be mode_path is error 。。 please give me :model_path, mode_base args
@rohithbojja
nohup python scripts/merge_lora_weights.py --model-path=../checkpoints/llava-v1.6-34b-xxx-lora-5000 --model-base=../checkpoints/llava-v1.6-34b --save-model-path=../checkpoints/llava-v1.6-34b-xxx-5000 &
@rohithbojja
nohup python scripts/merge_lora_weights.py --model-path=../checkpoints/llava-v1.6-34b-xxx-lora-5000 --model-base=../checkpoints/llava-v1.6-34b --save-model-path=../checkpoints/llava-v1.6-34b-xxx-5000 &
ive fixed my adding "lora" to model-path
Can you please provide some example of your training data?
system="""<|im_start|>system\nAnswer the questions.""",
roles=("<|im_start|>user\n", "<|im_start|>assistant\n"),
I was wondering why you chose to add a new conversation format. I was trying to tune based on your PR and my existing data made for LLaVA 1.5 fine-tuning which uses 'V1' version, but currently running into issues where the tokenizer length is mismatching
Checkout my Wandb logs.
https://wandb.ai/21b81a66a5/huggingface/runs/4pslu1px/overview?nw=nwuser21b81a66a5
And my notebook used to train
https://colab.research.google.com/drive/10OG4JsmSZ6kd8pyDxxhjHWkhK2ZOgVH4
#!/bin/bash
deepspeed llava/train/train_mem.py
--lora_enable True --lora_r 16 --lora_alpha 32 --mm_projector_lr 2e-5
--deepspeed ./scripts/zero2.json
--model_name_or_path /home/rohith/llava-v1.6-mistral-7b-bnb-4bit/
--version mistral_instruct
--data_path /home/rohith/Desktop/vqa/vqa/images/filtered_dataset.json
--image_folder /home/rohith/Desktop/vqa/vqa/images/
--vision_tower openai/clip-vit-large-patch14-336
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--mm_patch_merge_type spatial_unpad
--image_aspect_ratio anyres
--group_by_modality_length False
--bf16 False
--fp16 True
--output_dir /home/rohith/LLaVA-1.6-ft/llava_lora_mistral_med/
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 4
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 500
--save_total_limit 5
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.05
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 4096
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb \
using this script gives me error
ValueError: .to
is not supported for 4-bit
or 8-bit
bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype
.
and using original model doesnt give any error, used panoyo9829/llava-v1.6-mistral-7b-bnb-4bit model
Can anyone share the filtered_dataset json for the 34b training?
Yours, Alex On Apr 24, 2024 at 4:42 AM -0700, Rohith Bojja @.***>, wrote:
#!/bin/bash deepspeed llava/train/train_mem.py --lora_enable True --lora_r 16 --lora_alpha 32 --mm_projector_lr 2e-5 --deepspeed ./scripts/zero2.json --model_name_or_path /home/rohith/llava-v1.6-mistral-7b-bnb-4bit/ --version mistral_instruct --data_path /home/rohith/Desktop/vqa/vqa/images/filtered_dataset.json --image_folder /home/rohith/Desktop/vqa/vqa/images/ --vision_tower openai/clip-vit-large-patch14-336 --mm_projector_type mlp2x_gelu --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --mm_patch_merge_type spatial_unpad --image_aspect_ratio anyres --group_by_modality_length False --bf16 False --fp16 True --output_dir /home/rohith/LLaVA-1.6-ft/llava_lora_mistral_med/ --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy "no" --save_strategy "steps" --save_steps 500 --save_total_limit 5 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.05 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 True --model_max_length 4096 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to wandb
using this script gives me error ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype. and using original model doesnt give any error, used panoyo9829/llava-v1.6-mistral-7b-bnb-4bit model — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
@findalexli Use this to download dataset
https://drive.google.com/file/d/1gYLOFaz7Mn-E2u9ksT0R2BOai7MnmNcm/view?usp=drivesdk
It has following structure VQA- 1.images |_img1 |_img2 2.train |_filtered_dataset.json
1 image is truncated, Remove it, Use this to detect
from PIL import Image import os
trunk_ = 0 def is_truncated(image_path): try: # Open the image file img = Image.open(image_path) # Check if the image is truncated by trying to load it img.load() return False # Image is not truncated except Exception as e: print(f"Error loading image {image_path}: {e}") return True # Image is truncated or corrupt
def check_for_truncated_images(directory,trunk_): # Iterate through all files in the directory for filename in os.listdir(directory): # Check if the file is an image if filename.endswith(('.jpg', '.jpeg', '.png', '.gif', '.bmp')): image_path = os.path.join(directory, filename) if is_truncated(image_path): print(f"The image {filename} in directory {directory} is truncated.") trunk_ = 1 else: trunk_=0 print(trunk_)
directory_path = '/workspace/vqa/images' check_for_truncated_images(directory_path,0)
Also remove the entry in json. Otherwise you'll end up failing at 30% or so
Good luck
Hi, Thanks for working on private version of anyres llava.
I have done fintuning vicuna-v1.5-7b with anyres / spatial_unpad in same configuration as above, but the result doesn't seem to work out well on lmms-eval with MME score 357 / 224 (LLaVA-v1.5-7B : 1519 / 332).
Have you done some evaluation on public benchmarks and got similar score?
Hi! Thanks for sharing!
However, when I execute your training script, it also trains the vision encoder (adapter_model.safetensors
contains vision encoder weights)
Is there a way to disable the gradient backprop into these weights as done for the original llava fine-tuning?