Chunjiang Ge (葛春江) comments

Results 39 comments of


                                            Chunjiang Ge (葛春江)

How to finetue 8B model with fully parameters on 4*40G A100

> 8B model, zero3 could run. Sorry, I mistake it for 8 40G GPUs. Maybe you should try it on 8 GPUs or 2B/4B model. Otherwise you should try LoRA.

微调qwen2.5VL 'weight' must be 2-D

Hello, do you still have this bug with latest code? This may not seem to be a bug correlated with our code.

Qwen怎么实现lora sft

会更新 lora 训练代码，请等待一下

Qwen怎么实现lora sft

#1628

How to correctly add extra trainable module in qwen-2.5-vl?

If you want to finetune the model to align with some tasks not exist in pretrain stage, you should set a higher learning rate. If you do not want to...

Qwen3-VL Moe DeepSpeed Stage3 Support

Yes you could finetune 8B model with stage3. Deepspeed does not support MOE model with zero3.

Issue of Grounding/detection task fintuning on private dataset

1. 保证解析出来的格式一致即可 2. 可以 3. 在词表中加入特殊token，检查特殊token是否能被正确tokenize

Support SFT and DPO training for Qwen2VL

https://github.com/TideDra/lmm-r1 This repo supports vision language model training using OpenRLHF. Maybe this does not require many modifications.

qwen3-vl-4b-instruct使用

是不是设的 max output tokens 太少了

qwen3-vl-4b-instruct使用

generated_ids = model.generate(**inputs, max_new_tokens=128)