Kingsley comments

Results 80 comments of


                                            Kingsley

How to add a large amount of new special tokens?

You can refer to this PR https://github.com/hiyouga/LLaMA-Factory/pull/9267

GPU Imbalanced Loading

This issue seems similar to #5991. In your case, batchsize_per_device is set to 1. GPU utilization will be different due to different sequence length on per gpu.

Why does the loss suddenly drop to zero and the grad_norm stabilize at 1 at specific training steps when fine-tuning Llama3.2-vision-11B?

猜测是训崩了，没崩之前这个loss已经很低了，这个step是在第几个epoch?

qwen3:30b-a3b SFT with lora-rank as 16 is very very slow

See https://github.com/QwenLM/Qwen3/issues/736#issuecomment-2207996348

Why does it take such a long time to perform SFT using LoRA?

lora为啥要开z3，看你也不缺显存吧

Why does it take such a long time to perform SFT using LoRA?

噢噢对于这么长的序列那确实需要开，如果怀疑是进程有问题的话把这些问题pid记录一下用py-spy看看具体在执行什么呢

Why does it take such a long time to perform SFT using LoRA?

sudo fuser -v /dev/nvidia* 查一下这些进程是不是都挂在你gpu上了

https://github.com/huggingface/transformers/blob/51083d1bac7905aa8316b75f7897bdd4e5302044/src/transformers/models/llava_next/image_processing_llava_next.py#L726C9-L728C10 ```python return BatchFeature( data={"pixel_values": processed_images, "image_sizes": image_sizes}, tensor_type=return_tensors ) ``` 经过了llava-next的image_processor之后应该会存在这个`image_sizes` key的，图片输入正确吗

适应Lora对QWQ推理模型进行微调以后回答效果很差

1. Reasoning model还是通过构造一些长COT数据来sft比较好, 这样更加符合reasoning模型训练时的数据分布。 ```ans: xxxxyyy ``` 3. 看这个loss还是有点过拟合，减小epoch，降低学习率。

qwen3omni微调后用官方推理脚本报错

看起来是chattemplate.jinja的问题你和原来的模型的模版对一下diff看看