[minillm] does qwen2 support model parallelism?
When performing SFT on the Qwen2.5 32B model, I encountered an OOM (Out of Memory) issue. I would like to ask if the Qwen series of models supports model parallelism during SFT?
Yes. You can use Model Parallelism just like other models by setting
OPTS+=" --model-parallel"
OPTS+=" --model-parallel-size ${MP_SIZE}"
Yes. You can use Model Parallelism just like other models by setting
OPTS+=" --model-parallel" OPTS+=" --model-parallel-size ${MP_SIZE}"
Thank you! I have solved the problem. I would also like to ask: when using Qwen models for supervised fine-tuning (SFT) on my custom dataset, the validation loss first decreases and then increases, and both the teacher model and student model exhibit repetitive generation patterns. How did you resolve this issue?
You can increase the temperature or use a repetition penalty in generation