[Feature] 使用internvl_chat_llava/scripts_internvl/pretrain_internvit6b_448_vicuna7b.sh预训练时train/grad_norm值0.5左右loss在0.5与2.4之间振荡
Motivation
如题,grad_norm很快降为0.5左右,是什么参数不对吗?
资源:1*A100 80G
训练参数:
deepspeed --include localhost:2
llava/train/train_mem.py
--deepspeed ./scripts/zero2.json
--model_name_or_path $DATA_HOME/pretrained_mm_projector/vicuna-7b-v1.5
--version plain
--data_path $DATA_HOME/LLaVA-Pretrain/enhanced_llava_pretrain_data_708K.json
--image_folder $DATA_HOME/LLaVA-Pretrain/images
--vision_tower $DATA_HOME/pretrained_mm_projector/InternViT-300M-448px
--mm_projector_type mlp2x_gelu
--tune_mm_mlp_adapter True
--mm_vision_select_layer -4
--mm_use_im_start_end False
--mm_use_im_patch_token False
--bf16 True
--output_dir ${OUTPUT_DIR}
--num_train_epochs 2
--per_device_train_batch_size 10
--per_device_eval_batch_size 2
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 3000
--save_total_limit 3
--learning_rate 1e-2
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 8192
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to "tensorboard"
| tee ${OUTPUT_DIR}/train.log
问题日志 {'loss': 1.2109, 'grad_norm': 0.4884481728076935, 'learning_rate': 0.0017591721542803388, 'epoch': 0.01} {'loss': 0.6797, 'grad_norm': 0.15263621509075165, 'learning_rate': 0.0017615239887111948, 'epoch': 0.01} {'loss': 1.3359, 'grad_norm': 0.47711506485939026, 'learning_rate': 0.0017638758231420509, 'epoch': 0.01} {'loss': 0.7227, 'grad_norm': 0.17119143903255463, 'learning_rate': 0.001766227657572907, 'epoch': 0.01} {'loss': 1.6719, 'grad_norm': 0.5899024605751038, 'learning_rate': 0.001768579492003763, 'epoch': 0.01} {'loss': 0.7617, 'grad_norm': 0.2523077428340912, 'learning_rate': 0.001770931326434619, 'epoch': 0.01} {'loss': 1.5391, 'grad_norm': 0.3828684985637665, 'learning_rate': 0.001773283160865475, 'epoch': 0.01} {'loss': 0.6523, 'grad_norm': 0.10931509733200073, 'learning_rate': 0.0017756349952963311, 'epoch': 0.01} {'loss': 1.4844, 'grad_norm': 0.4769705533981323, 'learning_rate': 0.0017779868297271872, 'epoch': 0.01} {'loss': 1.0156, 'grad_norm': 0.3717474341392517, 'learning_rate': 0.0017803386641580432, 'epoch': 0.01} {'loss': 0.5625, 'grad_norm': 0.11652082204818726, 'learning_rate': 0.0017826904985888995, 'epoch': 0.01} {'loss': 0.5625, 'grad_norm': 0.3019462525844574, 'learning_rate': 0.0017850423330197554, 'epoch': 0.01} {'loss': 1.0391, 'grad_norm': 1.5588973760604858, 'learning_rate': 0.0017873941674506114, 'epoch': 0.01} {'loss': 0.8203, 'grad_norm': 0.1840265393257141, 'learning_rate': 0.0017897460018814677, 'epoch': 0.01} {'loss': 0.7148, 'grad_norm': 0.08739178627729416, 'learning_rate': 0.0017920978363123235, 'epoch': 0.01} {'loss': 0.9961, 'grad_norm': 0.30124369263648987, 'learning_rate': 0.0017944496707431798, 'epoch': 0.01}
Related resources
No response
Additional context
No response
您好,您的数据中是不是包含一些纯文本数据呀。在LLaVA codebase里会把纯文本数据和多模态数据分开组batch,因此不同iter之间loss会有跳变,这是正常的