Zhang815
Zhang815
Hi, Thank you for your reply, after set distribution environment: GPUS_PER_NODE=1 WORKER_CNT=1 export MASTER_ADDR=localhost export MASTER_PORT=8214 export RANK=0" it appears another error: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 255) local_rank: 0 is there any...
Because I am not familiar with setting distribution environment, maybe this is because the parameter is not right... Here it is: ``` 2022-12-03 15:04:01 - instantiator.py[line:21] - INFO: Created a...
Hi, Sorry for replying late, and thank you for your patience. I just change the beginning of the script of the file-the train_vqa_distributed.sh and use bash command this is your...
``` #!/usr/bin/env # Guide: # This script supports distributed training on multi-gpu workers (as well as single-worker training). # Please set the options below according to the comments. # For...