LLaVA Pretrain and Finetune log

Can you provide the training loss log？ And how to evaluate pretrain stage's performance ?

Apr 26 '23 03:04 ZhengMengbin

Hi @ZhengMengbin Thank you for your interest in our work.

Attached are the screenshots of our training loss curves for both pretraining and finetuning stages. You may compare the loss curves for roughly verifying if the model training is healthy.

Pretraining: pretraining-loss

Visual Instruction Tuning: finetuning-loss

For evaluation for the pretraining stage, we may try evaluating on COCO caption. Note that if we use the raw CC captions, we may not compare directly with existing approaches, as the original CC captions are noisy and not in COCO style.

Apr 29 '23 04:04 haotian-liu

Hi @ZhengMengbin Thank you for your interest in our work.

Attached are the screenshots of our training loss curves for both pretraining and finetuning stages. You may compare the loss curves for roughly verifying if the model training is healthy.

Pretraining:

Visual Instruction Tuning:

For evaluation for the pretraining stage, we may try evaluating on COCO caption. Note that if we use the raw CC captions, we may not compare directly with existing approaches, as the original CC captions are noisy and not in COCO style.

Hi, I am trying to reproduce your excellent results with vicuna 1.1 (simply add the --version v1 config to the training command and use the vicuna 1.1 pre-trained weight instead). However, I find the loss during our pre-training process is much higher than yours. Our pre-train loss is still 2.1 after 4.6k training steps.

Can you share your training log and evaluation results based on vicuna 1.1? Thank you.

May 11 '23 01:05 yiranyyu

Hi @yiranyyu, the loss pattern is similar on Vicuna v1.1 and v0. But we observe a higher loss in pretraining stage than v0 as well. Our loss values are similar. This may be due to the differences in the pretrained prompt template.

After the finetuning, the model's loss are similar and can produce satisfactory results.

May 13 '23 18:05 haotian-liu

Closing the issue due to inactivity, please feel free to re-open if you meet any other issues, thanks.

May 25 '23 20:05 haotian-liu

@haotian-liu Excuse me, I replaced an llm model and modified the code for training. The initial loss during pretrain is about 5.5, and it is still about 3 after training for one epoch. What might be the cause?

Jun 28 '23 06:06 guozhiyao

@haotian-liu Excuse me, I replaced an llm model and modified the code for training. The initial loss during pretrain is about 5.5, and it is still about 3 after training for one epoch. What might be the cause?

@haotian-liu

I have tried the following:

Only input plain text sft data, the loaded loss is normal
Letting the visual module to train with linear module, but it does not alleviate the problem
Pretrain multiple epochs, the loss is still around 3, and it has not gone down
I tried two frameworks, llava and minigpt4, and the loss was both stuck at 3

Have you ever encountered this problem?

Jun 28 '23 08:06 guozhiyao

3. has

@haotian-liu Excuse me, I replaced an llm model and modified the code for training. The initial loss during pretrain is about 5.5, and it is still about 3 after training for one epoch. What might be the cause?

Excuse me, we also encountered the same problem, have you solved it? Thanks

Sep 05 '23 11:09 yzf-code

LLaVA LLaVA copied to clipboard

Pretrain and Finetune log

LLaVA
LLaVA copied to clipboard