MARCONet
MARCONet copied to clipboard
Multi-GPU training
Hi, Multi-GPU training has been consistently failing. Would it be possible to provide a screenshot of 'pip list' to see the version of each package installed, or if there is an environment image file available?
Hi, Multi-GPU training has been consistently failing. Would it be possible to provide a screenshot of 'pip list' to see the version of each package installed, or if there is an environment image file available?
Hi, you can show me the error you have.
You can refer to the package that I use.
THX, The problem I encountered is in multi-node, multi-GPU training. Single gpu training is Fine.
THX, The problem I encountered is in multi-node, multi-GPU training. Single gpu training is Fine.
I am not sure about this problem. Maybe you can check whether the number of GPU IDs in CUDA_VISIBLE_DEVICES equals to the parameter nproc_per_node
.