LLaVA-NeXT llava-onevision-qwen2-7b-ov finetune based on LoRA

llava-onevision-qwen2-7b-ov finetune based on LoRA

Open yimuu opened this issue 5 months ago • 2 comments

When I fine-tune using Lora, the model's convergence effect is not good. The hyperparameters are set as follows: --lora_enable True
--deepspeed scripts/zero3.json
--model_name_or_path ${MODEL}
--version ${PROMPT_VERSION}
--data_path ./data/onevision_data.yaml
--vision_tower ${VISION_MODEL_VERSION}
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--group_by_modality_length True
--image_aspect_ratio pad
--image_grid_pinpoints "(1x1),...,(6x6)"
--mm_patch_merge_type spatial_unpad
--bf16 True
--output_dir "./checkpoints/llava-ov-lora"
--num_train_epochs 2
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 100
--save_total_limit 2
--learning_rate 1e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 4096
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to tensorboard
What could be the possible reasons? Furthermore, the loss is still around 0.3 when the training is completed. Is this abnormal?

Sep 12 '24 12:09 yimuu

LLaVA-NeXT LLaVA-NeXT copied to clipboard

llava-onevision-qwen2-7b-ov finetune based on LoRA

LLaVA-NeXT
LLaVA-NeXT copied to clipboard