mPLUG-DocOwl
mPLUG-DocOwl copied to clipboard
Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData
I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script zero3_offload_decay.json, the loss remains constant at 0 throughout the training process.
I changed the version of deepspeed based on pyproject.toml, based on the llava-v1.5 environment; ran vit_add_tome.py against TinyChart-3B-768-siglip.
Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high? Why does the loss remain at 0 when using the original DeepSpeed script?
I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.
Thank you for your assistance.
Hi @buckerF, We also encounter the zero-loss issue as you mentioned sometimes. It is very likely that there is NaN in the forward/backward pass due to fp16 precision. You can try to change bf16 or fp32 instead.
We do not encounter the high-loss issue you mentioned. Since Tinychart has been trained on the data, the loss is intended to be low. Do you make sure that the pre-trained parameters are correctly loaded?