LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

Anyres compatible fine-tuning of llava-1.6 mistral 7b and 34b

Open arielnlee opened this issue 10 months ago • 13 comments

Lowrank fine-tuning with anyres for the LLaVA Next models :)

arielnlee avatar Mar 27 '24 20:03 arielnlee

that is a good pr.. i am finetune this pr ..thanks .

@arielnlee

awzhgw avatar Apr 12 '24 05:04 awzhgw

Ofc, glad you found it useful! I'm sure the author's version is far superior (<3 llava), but wanted to leave this here for others to use until we get the real magic :)

@awzhgw

arielnlee avatar Apr 12 '24 21:04 arielnlee

@arielnlee I encountered an issue during the training process. I am using the LoRA fine-tuning method, and my data consists of two parts:

lots of Pure question-answering dialogues. Image-question-answering dialogues.

During training, I found that the training speed for the first part of the dataset is very slow, and it is as slow as the first part. After investigation, I found that the reason is:

在train.py的LazySupervisedDataset 类的__getitem__ 方法:

        if 'image' in self.list_data_dict[i]:
            data_dict['image'] = image
            data_dict['image_size'] = image_size
        elif self.data_args.is_multimodal:
            # image does not exist in the data, but the model is multimodal
            crop_size = self.data_args.image_processor.crop_size
            data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width'])
            data_dict['image_size'] = crop_size
        return data_dict

when i delete this code 👍

            elif self.data_args.is_multimodal:
                # image does not exist in the data, but the model is multimodal
                crop_size = self.data_args.image_processor.crop_size
                data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width'])
                data_dict['image_size'] = crop_size
            return data_dict

the train process error:

Traceback (most recent call last):
  File "/export/App/training_platform/PinoModel/LLaVA/llava/train/train_mem.py", line 9, in <module>
    train(attn_implementation="flash_attention_2")
  File "/export/App/training_platform/PinoModel/LLaVA/llava/train/train.py", line 1092, in train
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2744, in training_step
    self.accelerator.backward(loss)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1958, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1964, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 2152, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

how to fine-tuning for the pure text stage with very fast speed?

Can I do this? ? ? Finally trained a good llava model

awzhgw avatar Apr 14 '24 09:04 awzhgw

i got adapter_model.safetensor instead of adapter_model.bin after LORA FInetuning of 1.6-mistral and getting error as Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.30it/s] Traceback (most recent call last): File "/home/rohith/LLaVA-1.6-ft/scripts/merge_lora_weights.py", line 22, in merge_lora(args) File "/home/rohith/LLaVA-1.6-ft/scripts/merge_lora_weights.py", line 8, in merge_lora tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu') File "/home/rohith/LLaVA-1.6-ft/llava/model/builder.py", line 112, in load_pretrained_model mm_projector_weights = torch.load(os.path.join(model_path, 'mm_projector.bin'), map_location='cpu') File "/home/rohith/miniconda3/envs/llava/lib/python3.10/site-packages/torch/serialization.py", line 986, in load with _open_file_like(f, 'rb') as opened_file: File "/home/rohith/miniconda3/envs/llava/lib/python3.10/site-packages/torch/serialization.py", line 435, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/rohith/miniconda3/envs/llava/lib/python3.10/site-packages/torch/serialization.py", line 416, in init super().init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: '/home/rohith/Documents/mistral-llava/mm_projector.bin'

when trying to merge model

rohithbojja avatar Apr 17 '24 12:04 rohithbojja

@rohithbojja may be mode_path is error 。。 please give me :model_path, mode_base args

awzhgw avatar Apr 18 '24 01:04 awzhgw

@rohithbojja

nohup python scripts/merge_lora_weights.py --model-path=../checkpoints/llava-v1.6-34b-xxx-lora-5000 --model-base=../checkpoints/llava-v1.6-34b --save-model-path=../checkpoints/llava-v1.6-34b-xxx-5000 &

awzhgw avatar Apr 18 '24 01:04 awzhgw

@rohithbojja

nohup python scripts/merge_lora_weights.py --model-path=../checkpoints/llava-v1.6-34b-xxx-lora-5000 --model-base=../checkpoints/llava-v1.6-34b --save-model-path=../checkpoints/llava-v1.6-34b-xxx-5000 &

ive fixed my adding "lora" to model-path

rohithbojja avatar Apr 18 '24 15:04 rohithbojja

Can you please provide some example of your training data?

system="""<|im_start|>system\nAnswer the questions.""",
roles=("<|im_start|>user\n", "<|im_start|>assistant\n"),

I was wondering why you chose to add a new conversation format. I was trying to tune based on your PR and my existing data made for LLaVA 1.5 fine-tuning which uses 'V1' version, but currently running into issues where the tokenizer length is mismatching

findalexli avatar Apr 20 '24 23:04 findalexli

Checkout my Wandb logs.

https://wandb.ai/21b81a66a5/huggingface/runs/4pslu1px/overview?nw=nwuser21b81a66a5

And my notebook used to train

https://colab.research.google.com/drive/10OG4JsmSZ6kd8pyDxxhjHWkhK2ZOgVH4

rohithbojja avatar Apr 21 '24 16:04 rohithbojja

#!/bin/bash

deepspeed llava/train/train_mem.py
--lora_enable True --lora_r 16 --lora_alpha 32 --mm_projector_lr 2e-5
--deepspeed ./scripts/zero2.json
--model_name_or_path /home/rohith/llava-v1.6-mistral-7b-bnb-4bit/
--version mistral_instruct
--data_path /home/rohith/Desktop/vqa/vqa/images/filtered_dataset.json
--image_folder /home/rohith/Desktop/vqa/vqa/images/
--vision_tower openai/clip-vit-large-patch14-336
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--mm_patch_merge_type spatial_unpad
--image_aspect_ratio anyres
--group_by_modality_length False
--bf16 False
--fp16 True
--output_dir /home/rohith/LLaVA-1.6-ft/llava_lora_mistral_med/
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 4
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 500
--save_total_limit 5
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.05
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 4096
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb \

using this script gives me error

ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.

and using original model doesnt give any error, used panoyo9829/llava-v1.6-mistral-7b-bnb-4bit model

rohithbojja avatar Apr 24 '24 11:04 rohithbojja

Can anyone share the filtered_dataset json for the 34b training?

Yours, Alex On Apr 24, 2024 at 4:42 AM -0700, Rohith Bojja @.***>, wrote:

#!/bin/bash deepspeed llava/train/train_mem.py --lora_enable True --lora_r 16 --lora_alpha 32 --mm_projector_lr 2e-5 --deepspeed ./scripts/zero2.json --model_name_or_path /home/rohith/llava-v1.6-mistral-7b-bnb-4bit/ --version mistral_instruct --data_path /home/rohith/Desktop/vqa/vqa/images/filtered_dataset.json --image_folder /home/rohith/Desktop/vqa/vqa/images/ --vision_tower openai/clip-vit-large-patch14-336 --mm_projector_type mlp2x_gelu --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --mm_patch_merge_type spatial_unpad --image_aspect_ratio anyres --group_by_modality_length False --bf16 False --fp16 True --output_dir /home/rohith/LLaVA-1.6-ft/llava_lora_mistral_med/ --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --evaluation_strategy "no" --save_strategy "steps" --save_steps 500 --save_total_limit 5 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.05 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 True --model_max_length 4096 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to wandb
using this script gives me error ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype. and using original model doesnt give any error, used panoyo9829/llava-v1.6-mistral-7b-bnb-4bit model — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

findalexli avatar Apr 24 '24 15:04 findalexli

@findalexli Use this to download dataset

https://drive.google.com/file/d/1gYLOFaz7Mn-E2u9ksT0R2BOai7MnmNcm/view?usp=drivesdk

It has following structure VQA- 1.images |_img1 |_img2 2.train |_filtered_dataset.json

1 image is truncated, Remove it, Use this to detect

from PIL import Image import os

trunk_ = 0 def is_truncated(image_path): try: # Open the image file img = Image.open(image_path) # Check if the image is truncated by trying to load it img.load() return False # Image is not truncated except Exception as e: print(f"Error loading image {image_path}: {e}") return True # Image is truncated or corrupt

def check_for_truncated_images(directory,trunk_): # Iterate through all files in the directory for filename in os.listdir(directory): # Check if the file is an image if filename.endswith(('.jpg', '.jpeg', '.png', '.gif', '.bmp')): image_path = os.path.join(directory, filename) if is_truncated(image_path): print(f"The image {filename} in directory {directory} is truncated.") trunk_ = 1 else: trunk_=0 print(trunk_)

directory_path = '/workspace/vqa/images' check_for_truncated_images(directory_path,0)

Also remove the entry in json. Otherwise you'll end up failing at 30% or so

Good luck

rohithbojja avatar Apr 24 '24 21:04 rohithbojja

Hi, Thanks for working on private version of anyres llava.

I have done fintuning vicuna-v1.5-7b with anyres / spatial_unpad in same configuration as above, but the result doesn't seem to work out well on lmms-eval with MME score 357 / 224 (LLaVA-v1.5-7B : 1519 / 332).

Have you done some evaluation on public benchmarks and got similar score?

diridiri avatar May 13 '24 06:05 diridiri

Hi! Thanks for sharing! However, when I execute your training script, it also trains the vision encoder (adapter_model.safetensors contains vision encoder weights) Is there a way to disable the gradient backprop into these weights as done for the original llava fine-tuning?

NicoZenith avatar Jun 05 '24 14:06 NicoZenith