LLaVA
LLaVA copied to clipboard
Pretrain and Finetune log
Can you provide the training loss log? And how to evaluate pretrain stage's performance ?
Hi @ZhengMengbin Thank you for your interest in our work.
Attached are the screenshots of our training loss curves for both pretraining and finetuning stages. You may compare the loss curves for roughly verifying if the model training is healthy.
Pretraining:
Visual Instruction Tuning:
For evaluation for the pretraining stage, we may try evaluating on COCO caption. Note that if we use the raw CC captions, we may not compare directly with existing approaches, as the original CC captions are noisy and not in COCO style.
Hi @ZhengMengbin Thank you for your interest in our work.
Attached are the screenshots of our training loss curves for both pretraining and finetuning stages. You may compare the loss curves for roughly verifying if the model training is healthy.
Pretraining:
Visual Instruction Tuning:
For evaluation for the pretraining stage, we may try evaluating on COCO caption. Note that if we use the raw CC captions, we may not compare directly with existing approaches, as the original CC captions are noisy and not in COCO style.
Hi, I am trying to reproduce your excellent results with vicuna 1.1 (simply add the --version v1
config to the training command and use the vicuna 1.1 pre-trained weight instead). However, I find the loss during our pre-training process is much higher than yours. Our pre-train loss is still 2.1 after 4.6k training steps.
Can you share your training log and evaluation results based on vicuna 1.1? Thank you.
Hi @yiranyyu, the loss pattern is similar on Vicuna v1.1 and v0. But we observe a higher loss in pretraining stage than v0 as well. Our loss values are similar. This may be due to the differences in the pretrained prompt template.
After the finetuning, the model's loss are similar and can produce satisfactory results.
Closing the issue due to inactivity, please feel free to re-open if you meet any other issues, thanks.
@haotian-liu Excuse me, I replaced an llm model and modified the code for training. The initial loss during pretrain is about 5.5, and it is still about 3 after training for one epoch. What might be the cause?
@haotian-liu Excuse me, I replaced an llm model and modified the code for training. The initial loss during pretrain is about 5.5, and it is still about 3 after training for one epoch. What might be the cause?
@haotian-liu
I have tried the following:
- Only input plain text sft data, the loaded loss is normal
- Letting the visual module to train with linear module, but it does not alleviate the problem
- Pretrain multiple epochs, the loss is still around 3, and it has not gone down
- I tried two frameworks, llava and minigpt4, and the loss was both stuck at 3
Have you ever encountered this problem?
3. has
@haotian-liu Excuse me, I replaced an llm model and modified the code for training. The initial loss during pretrain is about 5.5, and it is still about 3 after training for one epoch. What might be the cause?
Excuse me, we also encountered the same problem, have you solved it? Thanks