InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

[Feature] Torchrun for MPO training

Open MathewCrespo opened this issue 9 months ago • 1 comments

Motivation

Many of us only have a single node with several GPUs, and it is more common to use torchrun than srun. Hopefully, there will be an official script for MPO training with torchrun.

Related resources

No response

Additional context

No response

MathewCrespo avatar Mar 06 '25 08:03 MathewCrespo

@lvhan028 @whai362

hekaijie123 avatar Mar 07 '25 07:03 hekaijie123

you can just replace srun with torchrun command, just like the fintuning script. it works. see this issue for refrence: https://github.com/OpenGVLab/InternVL/issues/856
No matter torchrun or slurm, just gpu schedule tool, so you can easily move to torchrun.

TimeOverflow avatar May 22 '25 02:05 TimeOverflow

You can refer to this script.

Weiyun1025 avatar Aug 30 '25 04:08 Weiyun1025