mPLUG-DocOwl icon indicating copy to clipboard operation
mPLUG-DocOwl copied to clipboard

Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData

Open buckerF opened this issue 1 year ago • 1 comments

I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script zero3_offload_decay.json, the loss remains constant at 0 throughout the training process. I changed the version of deepspeed based on pyproject.toml, based on the llava-v1.5 environment; ran vit_add_tome.py against TinyChart-3B-768-siglip.

Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high? Why does the loss remain at 0 when using the original DeepSpeed script?

I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.

Thank you for your assistance.

buckerF avatar Jul 01 '24 06:07 buckerF

Hi @buckerF, We also encounter the zero-loss issue as you mentioned sometimes. It is very likely that there is NaN in the forward/backward pass due to fp16 precision. You can try to change bf16 or fp32 instead.

We do not encounter the high-loss issue you mentioned. Since Tinychart has been trained on the data, the loss is intended to be low. Do you make sure that the pre-trained parameters are correctly loaded?

zhangliang-04 avatar Jul 08 '24 02:07 zhangliang-04