'loss': 0.0 and 'grad_norm' remains the same in all steps in task lora finetuning
I am training task lora on "liuhaotian/llava-v1.5-13b" by following the same code in LLaVA repo: https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune_task_lora.sh
The above runs fine in LLaVA repo (https://github.com/haotian-liu/LLaVA/tree/main). When I run it in this LLaVA-NeXT repo (slightly modified few lines of code in train.py to include llava model), the training runs but it keeps showing 'loss': 0.0 and same 'grad_norm". Any idea?
Hi, I got the same issue. Did you manage to resolve this?
Hi, I got the same issue. What kind of GPU are you using?
For me, it was the V100 @paperman-anne running DPO training. How about you?
have you solved yet?
I got the same issue. have you solved yet?
I got the same issue. have you solved yet?
你好解决了吗?
probably max_length is too small, GT will be truncated?
probably max_length is too small, GT will be truncated?
my max_length is 512. But when i use lora , i find that appears /root/lanyun-tmp/env/llava_next/lib/python3.10/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 0%| | 0/28479 [00:00<?, ?it/s]/root/lanyun-tmp/env/llava_next/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /root/lanyun-tmp/env/llava_next/lib/python3.10/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /root/lanyun-tmp/env/llava_next/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
probably max_length is too small, GT will be truncated?
Bro, you are right. When I delete the
--model_max_length 32768 \
Needs to be set for any SigLIP run or else the loss is constant and grad_norm is 0.0
for any fine-tuning run, not just with LORA finetuning