FineTune 78B model on multi-node slrum system
Hi,
I would like to finetune on a multi-node GPU system. Each node has 8 A100. And the system uses slrum. I am not sure if the fineturn command below works.
PARTITION='your partition' GPUS=32 PER_DEVICE_BATCH_SIZE=1 sh shell/internvl2.5/2nd_finetune/internvl2_5_78b_dynamic_res_2nd_finetune_full.sh
I do not know it it works or not because I need to set a conda virtual env on the node. I do not know how to set the conda virtual env in the fine tune command above.
Can anyone help?
Thank you.
Autocar
Hi, This script is originally intended to be run on SLURM. You should first activate the conda environment you need using conda activate, and then run the finetune_full.sh script. This should be OK.
Hi @lll2343:
Thank you for replying to my question. When I login to this system, I am on the so-called "login node", which does not have GPU. The GPUs are on different nodes. Do you mean I should activate conda on the login node and then run finetune_full.sh on the login node?