LLaVA-NeXT finetune llava-onevision-qwen2-0.5b-si on custom dataset.

Hi, I hope to finetune llava-onevision-qwen2-0.5b-si on my own dataset. During the inference process after training the model, a warning appears stating: "Some weights of LlavaQwenForCausalLM were not initialized from the model checkpoint at /mnt/dolphinfs/ssd_pool/docker/user/hadoop-perception-zw04/baiyan02/llava_log/llava-onevision-qwen2-0.5b-si-test and are newly initialized: ['lm_head.weight']." Additionally, the inference output is garbled. However, when testing the model llava-onevision-qwen2-0.5b-si separately, it works normally.

my training scripts: META_NAME='test' OUTPUT_DIR=llava-onevision-qwen2-0.5b-si-$META_NAME LLM_VERSION="llava-onevision-qwen2-0.5b-si" VISION_MODEL_VERSION="llava-data/llava/siglip" DATA_PATH="test.json"

PROMPT_VERSION="qwen_1_5" LLM_VERSION_CLEAN="${LLM_VERSION////}" VISION_MODEL_VERSION_CLEAN="${VISION_MODEL_VERSION////}" RUN_NAME="llava-onevision-${VISION_MODEL_VERSION_CLEAN}-${LLM_VERSION_CLEAN}-${META_NAME}" echo "MID_RUN_NAME: ${RUN_NAME}"

torchrun --nproc_per_node=8 --nnodes=1
llava/train/train_mem.py
--deepspeed scripts/zero3.json
--model_name_or_path ${LLM_VERSION}
--version $PROMPT_VERSION
--data_path ${DATA_PATH}
--image_folder ""
--mm_tunable_parts="mm_vision_tower,mm_mlp_adapter,mm_language_model"
--mm_vision_tower_lr=2e-6
--vision_tower ${VISION_MODEL_VERSION}
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--group_by_modality_length True
--image_aspect_ratio anyres_max_9
--image_grid_pinpoints "(1x1),...,(6x6)"
--mm_patch_merge_type spatial_unpad
--bf16 True
--run_name $RUN_NAME
--output_dir ${OUTPUT_DIR}
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 4
--gradient_accumulation_steps 2
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1000
--save_total_limit 1
--learning_rate 1e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 32768
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to none
--torch_compile True
--torch_compile_backend "inductor"
--dataloader_drop_last True
--frames_upbound 32
--attn_implementation sdpa

during inference, I load my model by: model_name = "llava_qwen" tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, None, model_name, attn_implementation="sdpa", device_map="auto", multimodal=True) Thank you very much for your assistance.

Dec 18 '24 11:12 yanbai1993

Same issue. Have you solved it?

Dec 25 '24 10:12 XIDIANPQZ

Same issue. Have you solved it?

updating model_type in config.json to "qwen2" is effectively

Dec 26 '24 08:12 XIDIANPQZ

Same issue. Have you solved it?

updating model_type in config.json to "qwen2" is effectively

I solved this problem by replacing the config.json in the model save file directly with the official original config.json. But I don't understand whether this error is my operation problem or the project bug?

Jan 15 '25 02:01 Davidwhw

Same issue. Have you solved it?

updating model_type in config.json to "qwen2" is effectively

I solved this problem by replacing the config.json in the model save file directly with the official original config.json. But I don't understand whether this error is my operation problem or the project bug?

Did the same

Feb 13 '25 22:02 GabrieleGiudic

LLaVA-NeXT LLaVA-NeXT copied to clipboard

finetune llava-onevision-qwen2-0.5b-si on custom dataset.

LLaVA-NeXT
LLaVA-NeXT copied to clipboard