LLaVA
LLaVA copied to clipboard
[Question] add extral layers in llava
Question
I have tried to add more linear layer to deal with some extra features. And i only want to train this layers. and i uses the scripts below to finetune the whole model, but when i tried to print the grad and weight. i found that grad is none and weight does not change. So i want to figure out that: is this the problem of lora or something else. And i have alread set the require_grads parameter to True in the training.py file.
scripts
deepspeed train.py
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5
--deepspeed /mnt/modified_llavas/LLaVA/scripts/zero3.json
--model_name_or_path liuhaotian/llava-v1.5-7b
--version v2
--data_path /mnt/f/modified_llavas/LLaVA/llava_out_new.json
--image_folder test
--vision_tower openai/clip-vit-large-patch14-336
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--image_aspect_ratio pad
--group_by_modality_length True
--bf16 True
--output_dir /mnt/f/modified_llavas/LLaVA/llava_fullweights
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 5
--save_total_limit 1
--learning_rate 2e-4
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 4096
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb
add extra layers
and i have used the test_head layer in the llava_arch.py file. Does anyone have ideas about it?