LLaVA
LLaVA copied to clipboard
Pre-training with MPT-7B went well but fine-tuning it further gives garbled/random outputs
Discussion
After a few bug-fixes, I ran the pre-training code using mosaicml/mpt-7b model.
The pre-training script I used
deepspeed train_mem.py \ --deepspeed ./scripts/zero2.json \ --model_name_or_path mpt-7b \ --version mpt\ --data_path LLaVA-Pretrain/blip_laion_cc_sbu_558k.json \ --image_folder LLaVA-Pretrain/images \ --vision_tower openai/clip-vit-large-patch14 \ --mm_projector_type mlp2x_gelu \ --tune_mm_mlp_adapter True \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --bf16 True \ --output_dir ./checkpoints/llava-mpt-7b-vit-l-pretrain \ --num_train_epochs 1 \ --per_device_train_batch_size 32 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 24000 \ --save_total_limit 1 \ --learning_rate 1e-3 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True\ --dataloader_num_workers 1 \ --lazy_preprocess True \ --report_to wandb
I ran inference on this pre-trained model using the following script
python -m llava.serve.cli \ --model-base mpt-7b \ --model-path ./checkpoints/llava-mpt-7b-vit-l-pretrain/ \ --image-file "https://cdn.pixabay.com/photo/2024/02/28/07/42/european-shorthair-8601492_1280.jpg" \ --temperature 0.1
The output looks like follows, which is good enough for pre-training stage
After this I ran this instruction tuning script:
deepspeed train_mem.py \ --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \ --deepspeed ./scripts/zero3.json \ --model_name_or_path mpt-7b \ --version mpt\ --data_path LLaVA-InTune/llava_v1_5_mix665k.json \ --image_folder LLaVA-InTune/ \ --vision_tower openai/clip-vit-large-patch14 \ --pretrain_mm_mlp_adapter ./checkpoints/llava-mpt-7b-vit-l-pretrain/mm_projector.bin \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --bf16 True \ --output_dir ./checkpoints/llava-mpt-7b-vit-l-lora-fulldata \ --num_train_epochs 3 \ --per_device_train_batch_size 8\ --per_device_eval_batch_size 2\ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 50000\ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True\ --dataloader_num_workers 4 \ --lazy_preprocess True \ --report_to wandb \ --image_aspect_ratio pad \ --group_by_modality_length True
The training loss graph looked as follows:
However, now that I try to run inference on this saved checkpoint using this script:
python -m llava.serve.cli \ --model-base mpt-7b \ --model-path ./checkpoints/llava-mpt-7b-vit-l-lora-fulldata/ \ --image-file "https://as1.ftcdn.net/v2/jpg/06/05/37/40/1000_F_605374009_hEUHatmKPzuHTIacg7rLneAgnLHUgegM.jpg" \ --temperature 0.1
I get very random output as below:
2024-04-30 06:35:20,791] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) You are using a model of type mpt to instantiate a model of type llava_mpt. This is not supported for all configurations of models and can yield errors. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading LLaVA from base model... Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/home/user/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.63s/it] Some weights of LlavaMptForCausalLM were not initialized from the model checkpoint at /mnt/localssd/mpt-7b-test and are newly initialized: ['transformer.mm_projector.0.bias', 'transformer.mm_projector.0.weight', 'transformer.mm_projector.2.bias', 'transformer.mm_projector.2.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Loading additional LLaVA weights... Loading LoRA weights... Merging LoRA weights... Model is loaded... user: describe this image assistant: [
Docker 容器技朝
前言
Docker 是一个开源的容器技朝,它可以将一个应用程序的所有依赖项和配置文件打包在一个单一的容器中,从而可以在不同的操作系统上运行。
.........
Can anyone help me understand why is this happening and how can I resolve this?