InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

FineTune 78B model on multi-node slrum system

Open spcrobocar opened this issue 10 months ago • 2 comments

Hi,

I would like to finetune on a multi-node GPU system. Each node has 8 A100. And the system uses slrum. I am not sure if the fineturn command below works.

PARTITION='your partition' GPUS=32 PER_DEVICE_BATCH_SIZE=1 sh shell/internvl2.5/2nd_finetune/internvl2_5_78b_dynamic_res_2nd_finetune_full.sh

I do not know it it works or not because I need to set a conda virtual env on the node. I do not know how to set the conda virtual env in the fine tune command above.

Can anyone help?

Thank you.

Autocar

spcrobocar avatar Feb 10 '25 20:02 spcrobocar

Hi, This script is originally intended to be run on SLURM. You should first activate the conda environment you need using conda activate, and then run the finetune_full.sh script. This should be OK.

lll2343 avatar Feb 11 '25 03:02 lll2343

Hi @lll2343:

Thank you for replying to my question. When I login to this system, I am on the so-called "login node", which does not have GPU. The GPUs are on different nodes. Do you mean I should activate conda on the login node and then run finetune_full.sh on the login node?

spcrobocar avatar Feb 12 '25 17:02 spcrobocar