LLaVA
LLaVA copied to clipboard
issue Finetuning llava-v1.6-34b model
Describe the issue
Issue: I am finetuning llava-v1.6-34b on a some data, about 75,866 images with a resolution of 750X750 per image. I have tried finetuuning with a100 80GB (6 devices, 8 devices) and the h100 80GB (2, devices, 6 devices and 8 devices) when training it just exits and runs into out of memory error, or it just gives another error which I will provide.
whats are the specifications to successfully train the llava-v1.6-34b model, or is there any reason for the issue
Command:
PASTE THE COMMANDS HERE.
deepspeed llava/train/train_mem.py \
--deepspeed ./scripts/zero3.json \
--model_name_or_path liuhaotian/llava-v1.6-34b \
--version v1 \
--data_path ./training001/metadata.json \
--image_folder ./ \
--vision_tower openai/clip-vit-large-patch14-336 \
--mm_projector_type mlp2x_gelu \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--group_by_modality_length True \
--bf16 True \
--output_dir ./checkpoints/llava-v1.6-34b-task \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 50000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to wandb
Log:
PASTE THE LOGS HERE.
1.
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
Parameter Offload: Total persistent parameters: 1213440 in 369 params
Traceback (most recent call last):
File "/workspace/LLaVA/llava/train/train_mem.py", line 4, in <module>
train(attn_implementation="flash_attention_2")
File "/workspace/LLaVA/llava/train/train.py", line 969, in train
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1687, in _inner_training_loop
model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1198, in prepare
result = self._prepare_deepspeed(*args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1537, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/__init__.py", line 171, in initialize
engine = DeepSpeedEngine(args=args,
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 304, in __init__
self._configure_optimizer(optimizer, model_parameters)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1234, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1563, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer_Stage3(
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 362, in __init__
self._setup_for_real_optimizer()
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 465, in _setup_for_real_optimizer
self._create_fp32_partitions()
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 854, in _create_fp32_partitions
self.device).clone().float().detach())
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.16 GiB. GPU 0 has a total capacty of 79.15 GiB of which 1.16 GiB is free. Process 3400995 has 77.98 GiB memory in use. Of the allocated memory 74.87 GiB is allocated by PyTorch, and 2.48 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2024-03-21 10:19:08,590] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1464
[2024-03-21 10:19:08,591] [ERROR] [launch.py:321:sigkill_handler] ['/usr/bin/python', '-u', 'llava/train/train_mem.py', '--local_rank=0', '--deepspeed', './scripts/zero3.json', '--model_name_or_path', 'liuhaotian/llava-v1.6-34b', '--version', 'v1', '--data_path', './training001/metadata.json', '--image_folder', './training001', '--vision_tower', 'openai/clip-vit-large-patch14-336', '--mm_projector_type', 'mlp2x_gelu', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'True', '--output_dir', './checkpoints/llava-v1.6-34b-task', '--num_train_epochs', '1', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '50000', '--save_total_limit', '1', '--learning_rate', '2e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'wandb'] exits with return code = 1
2.
[2024-03-25 14:13:32,240] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3772
[2024-03-25 14:13:32,979] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3773
[2024-03-25 14:13:32,981] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3774
[2024-03-25 14:13:33,154] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3775
[2024-03-25 14:13:33,156] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3776
[2024-03-25 14:13:33,156] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3777
[2024-03-25 14:13:33,157] [ERROR] [launch.py:321:sigkill_handler] ['/usr/bin/python', '-u', 'llava/train/train_mem.py', '--local_rank=5', '--de
epspeed', './scripts/zero3.json', '--model_name_or_path', 'liuhaotian/llava-v1.6-34b', '--version', 'v1', '--data_path', './training001/metadat
a.json', '--image_folder', './', '--vision_tower', 'openai/clip-vit-large-patch14-336', '--mm_projector_type', 'mlp2x_gelu', '--mm_vision_selec
t_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_lengt
h', 'True', '--bf16', 'True', '--output_dir', './checkpoints/llava-v1.6-34b-task', '--num_train_epochs', '1', '--per_device_train_batch_size',
'4', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--s
ave_steps', '50000', '--save_total_limit', '1', '--learning_rate', '2e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_ty
pe', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_wo
rkers', '4', '--lazy_preprocess', 'True', '--report_to', 'wandb'] exits with return code = 1
root@04b9ed35b384:/workspace/LLaVA#
Screenshots: You may attach screenshots if it better explains the issue.
@adabadaramola I have a bit different issue. Can you please help me with it. I had followed the same fine tune script, i am getting error about not able to import llava module while executing train_mem.py file.
If you can help me with the code/script which you have used to start with finetning it will be helpfull to me.
Thanks
@adabadaramola I didn't realize that v1.6 was fine-tunable yet. Were you able to fine-tune a smaller 7b model instead?
@adabadaramola I have a bit different issue. Can you please help me with it. I had followed the same fine tune script, i am getting error about not able to import llava module while executing train_mem.py file.
If you can help me with the code/script which you have used to start with finetning it will be helpfull to me.
Thanks
You need to do pip install from the llava folder using the command
pip install -e .
I used 3 A100 80GB gpus for 1.6-34b and 1 A100 80GB for 1.6-mistral-7b. note: I've only tried this for low rank fine-tuning, not full! https://github.com/arielnlee/LLaVA-1.6-ft
@adabadaramola I didn't realize that v1.6 was fine-tunable yet. Were you able to fine-tune a smaller 7b model instead?
yeah I saw a guy finetune v1.6 on youtube for the smaller 7b, no I did not try that, felt the 34b was more suitable for my usecase
I used 3 A100 80GB gpus for 1.6-34b and 1 A100 80GB for 1.6-mistral-7b. note: I've only tried this for low rank fine-tuning, not full! https://github.com/arielnlee/LLaVA-1.6-ft
Thanks for this, I will try it out
I used 3 A100 80GB gpus for 1.6-34b and 1 A100 80GB for 1.6-mistral-7b. note: I've only tried this for low rank fine-tuning, not full! https://github.com/arielnlee/LLaVA-1.6-ft
thanks the script works, I have been trying to evaluate after training I have been getting errors, please can you help or tell me how you go about it, I already have the checkpoints in the output dir
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I used 3 A100 80GB gpus for 1.6-34b and 1 A100 80GB for 1.6-mistral-7b. note: I've only tried this for low rank fine-tuning, not full! https://github.com/arielnlee/LLaVA-1.6-ft
thanks the script works, I have been trying to evaluate after training I have been getting errors, please can you help or tell me how you go about it, I already have the checkpoints in the output dir
Do you mean the model is outputting errors or when you try to run you get errors? Before evaluating, I merge the LoRA weights back onto the base model. Then I eval on the “merged” fine-tune. There’s a python file in scripts that you can use to merge.
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I don’t, but I can throw one together this week!
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I don’t, but I can throw one together this week!
It would be great, I'd love to chat with you. What's your email? Btw I contacted you on your website :)
Describe the issue
Issue: I am finetuning llava-v1.6-34b on a some data, about 75,866 images with a resolution of 750X750 per image. I have tried finetuuning with a100 80GB (6 devices, 8 devices) and the h100 80GB (2, devices, 6 devices and 8 devices) when training it just exits and runs into out of memory error, or it just gives another error which I will provide.
whats are the specifications to successfully train the llava-v1.6-34b model, or is there any reason for the issue
Command:
PASTE THE COMMANDS HERE. deepspeed llava/train/train_mem.py \ --deepspeed ./scripts/zero3.json \ --model_name_or_path liuhaotian/llava-v1.6-34b \ --version v1 \ --data_path ./training001/metadata.json \ --image_folder ./ \ --vision_tower openai/clip-vit-large-patch14-336 \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length True \ --bf16 True \ --output_dir ./checkpoints/llava-v1.6-34b-task \ --num_train_epochs 1 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 50000 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --lazy_preprocess True \ --report_to wandbLog:
PASTE THE LOGS HERE. 1. /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( Parameter Offload: Total persistent parameters: 1213440 in 369 params Traceback (most recent call last): File "/workspace/LLaVA/llava/train/train_mem.py", line 4, in <module> train(attn_implementation="flash_attention_2") File "/workspace/LLaVA/llava/train/train.py", line 969, in train trainer.train() File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1687, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1198, in prepare result = self._prepare_deepspeed(*args) File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1537, in _prepare_deepspeed engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) File "/usr/local/lib/python3.10/dist-packages/deepspeed/__init__.py", line 171, in initialize engine = DeepSpeedEngine(args=args, File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 304, in __init__ self._configure_optimizer(optimizer, model_parameters) File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1234, in _configure_optimizer self.optimizer = self._configure_zero_optimizer(basic_optimizer) File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1563, in _configure_zero_optimizer optimizer = DeepSpeedZeroOptimizer_Stage3( File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 362, in __init__ self._setup_for_real_optimizer() File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 465, in _setup_for_real_optimizer self._create_fp32_partitions() File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 854, in _create_fp32_partitions self.device).clone().float().detach()) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.16 GiB. GPU 0 has a total capacty of 79.15 GiB of which 1.16 GiB is free. Process 3400995 has 77.98 GiB memory in use. Of the allocated memory 74.87 GiB is allocated by PyTorch, and 2.48 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF [2024-03-21 10:19:08,590] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1464 [2024-03-21 10:19:08,591] [ERROR] [launch.py:321:sigkill_handler] ['/usr/bin/python', '-u', 'llava/train/train_mem.py', '--local_rank=0', '--deepspeed', './scripts/zero3.json', '--model_name_or_path', 'liuhaotian/llava-v1.6-34b', '--version', 'v1', '--data_path', './training001/metadata.json', '--image_folder', './training001', '--vision_tower', 'openai/clip-vit-large-patch14-336', '--mm_projector_type', 'mlp2x_gelu', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'True', '--output_dir', './checkpoints/llava-v1.6-34b-task', '--num_train_epochs', '1', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '50000', '--save_total_limit', '1', '--learning_rate', '2e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'wandb'] exits with return code = 1 2. [2024-03-25 14:13:32,240] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3772 [2024-03-25 14:13:32,979] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3773 [2024-03-25 14:13:32,981] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3774 [2024-03-25 14:13:33,154] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3775 [2024-03-25 14:13:33,156] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3776 [2024-03-25 14:13:33,156] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3777 [2024-03-25 14:13:33,157] [ERROR] [launch.py:321:sigkill_handler] ['/usr/bin/python', '-u', 'llava/train/train_mem.py', '--local_rank=5', '--de epspeed', './scripts/zero3.json', '--model_name_or_path', 'liuhaotian/llava-v1.6-34b', '--version', 'v1', '--data_path', './training001/metadat a.json', '--image_folder', './', '--vision_tower', 'openai/clip-vit-large-patch14-336', '--mm_projector_type', 'mlp2x_gelu', '--mm_vision_selec t_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_lengt h', 'True', '--bf16', 'True', '--output_dir', './checkpoints/llava-v1.6-34b-task', '--num_train_epochs', '1', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--s ave_steps', '50000', '--save_total_limit', '1', '--learning_rate', '2e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_ty pe', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_wo rkers', '4', '--lazy_preprocess', 'True', '--report_to', 'wandb'] exits with return code = 1 root@04b9ed35b384:/workspace/LLaVA#Screenshots: You may attach screenshots if it better explains the issue.
Hi, base on my experience in finetuning yi 34b. It seems like your need use batch size1 and zero3_offload
I used 3 A100 80GB gpus for 1.6-34b and 1 A100 80GB for 1.6-mistral-7b. note: I've only tried this for low rank fine-tuning, not full! https://github.com/arielnlee/LLaVA-1.6-ft
Hi! LLaVA train vit parameters in 2 training strategy, but they didn't release their ViT parameters. Could you teach me how to solve this promble. Or just using raw clip parameter. Thanks!
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I don’t, but I can throw one together this week!
Hey, @arielnlee Let me know if you got something :)
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I don’t, but I can throw one together this week!
@arielnlee How did you fine-tuned LLaVA 1.6 34b? Do you have any resources for this?
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I don’t, but I can throw one together this week!
Hey, @arielnlee Let me know if you got something :)
Apologies, the week got away from me, but it's still on my list. In the meantime it should work by using the repo!
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I don’t, but I can throw one together this week!
@arielnlee How did you fine-tuned LLaVA 1.6 34b? Do you have any resources for this?
I have a question. I've been trying to fine-tune llava 7b, everything works but the results did not change at all. I just wanted it to label one image to what I fine tune it with, it still recognize the image as something else.
How big is your fine-tuning dataset? And what's the task? For my specific use-case, the scripts work well. The size of my dataset is ~20k.
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I don’t, but I can throw one together this week!
Hey, @arielnlee Let me know if you got something :)
Apologies, the week got away from me, but it's still on my list. In the meantime it should work by using the repo!
@arielnlee I'm curious - About how many lines of examples to fine-tune LLaVA to get results? Also, can you make a notebook this week?
Hey, @arielnlee Do you have a notebook for fine-tuning 1.6-34b?
I don’t, but I can throw one together this week!
Hey, @arielnlee Let me know if you got something :)
Apologies, the week got away from me, but it's still on my list. In the meantime it should work by using the repo!
@arielnlee I'm curious - About how many lines of examples to fine-tune LLaVA to get results? Also, can you make a notebook this week?
I am trying to feed it the same image that i used for finetuning but it still predict it wrong .. am i getting anything wrong
Hi, how do you know the training was effective? Did you use the default training setting? I LoRA with default parameters and basically no improvement.
Hi, how do you know the training was effective? Did you use the default training setting? I LoRA with default parameters and basically no improvement.
I don't think the training script for 1.5 works for 1.6 at the moment. I looked into the llava/train/train.py, llava/model/builder.py, and llava/model/langauge_model, and noticed that they are not compatible with training 1.6. For example, I found that even though I tried to finetune llava 1.6 Mistral, the training file initiated a llava llama for me, because in the train.py, only llama and mpt instance were told to be initiated. I think if you want to fine-tune 1.6, you need to change many of the files manually.
Hi, how do you know the training was effective? Did you use the default training setting? I LoRA with default parameters and basically no improvement.
I don't think the training script for 1.5 works for 1.6 at the moment. I looked into the llava/train/train.py, llava/model/builder.py, and llava/model/langauge_model, and noticed that they are not compatible with training 1.6. For example, I found that even though I tried to finetune llava 1.6 Mistral, the training file initiated a llava llama for me, because in the train.py, only llama and mpt instance were told to be initiated. I think if you want to fine-tune 1.6, you need to change many of the files manually.
@songchx24 Check here: https://github.com/arielnlee/LLaVA-1.6-ft
Hi, how do you know the training was effective? Did you use the default training setting? I LoRA with default parameters and basically no improvement.
I don't think the training script for 1.5 works for 1.6 at the moment. I looked into the llava/train/train.py, llava/model/builder.py, and llava/model/langauge_model, and noticed that they are not compatible with training 1.6. For example, I found that even though I tried to finetune llava 1.6 Mistral, the training file initiated a llava llama for me, because in the train.py, only llama and mpt instance were told to be initiated. I think if you want to fine-tune 1.6, you need to change many of the files manually.
@songchx24 Check here: https://github.com/arielnlee/LLaVA-1.6-ft
So this you have successfully changed code to fine-tune the 1.6? Nice! Thanks for the info!
Btw, may I ask it can support all 1.6 version lora or just Mistral?
Hi, how do you know the training was effective? Did you use the default training setting? I LoRA with default parameters and basically no improvement.
I don't think the training script for 1.5 works for 1.6 at the moment. I looked into the llava/train/train.py, llava/model/builder.py, and llava/model/langauge_model, and noticed that they are not compatible with training 1.6. For example, I found that even though I tried to finetune llava 1.6 Mistral, the training file initiated a llava llama for me, because in the train.py, only llama and mpt instance were told to be initiated. I think if you want to fine-tune 1.6, you need to change many of the files manually.
OK that explains a lot, because I tried my 1.6 it basically has no improvement after lora with main repo. Btw @arielnlee has shown a repo I think maybe it already has someone who changed the code to make it suitable for 1.6 finetune.
If someone has fine-tuned models with sizes of 1.6B( 7B, and 13B ), can you mention the minimum hardware requirements?
If someone has fine-tuned models with sizes of 1.6B( 7B, and 13B ), can you mention the minimum hardware requirements?
@babuus Not sure the minimum requirements, but seems 1 A100 80G works referring to https://github.com/haotian-liu/LLaVA/issues/1335#issuecomment-2023922331
