Yu-won Lee
Yu-won Lee
Could show me your dataset example and your training script? I think the optimizer or deepspeed config has something wrong with it. I gotta check for the error.
I haven't tested with it but, I could think one thing that you could check with upcasting the model to fp32. Also mainly the code is set for deepspeed so...
Are you trying to finetune the embedding too?
The code is not for multi-label so you need to tweak a bit in the `cls_dataset.py`.
@BingfengHan @Bentonmaster For multi-label classification, you should change the label part in the `cls_dataset.py`. The dataset format should look like ``` "label": [ class1: 0 class2: 1 class3: 1 ]...
Your dataset have no `` token. That might be the issue.
Are you measuring the exact token answer or the perplexity? Also, the model could have problem like "catastrophic forgetting".
Did you merge the lora weights? Or it could be a issue that updates to the lora weight was insufficient. You could increase the rank and the alpha
The fact that accuracy falls while loss → 0 is usually a sign that either 1. The adapter weights are not the ones being evaluated, or 2. Generation‐time prompt /...
You should freeze the merger too. https://github.com/2U1/Qwen2-VL-Finetune/blob/d5ebc1a417b1d1bf1da24ba8c20e8406356cdcb2/scripts/finetune_lora_vision.sh#L17 Also If you were using the unmerged weight then check if you have properly loaded the merger(projection) weight.