Auto-ReID-Fast
Auto-ReID-Fast copied to clipboard
Running with distributed
Hello, Duan.I am trying to repo this code .There are some questions when I use distributed. I can't use this command:
srun -n your_node_nums --gres gpu:gpunums -p your_partition
Error:
The program 'srun' is currently not installed. To run 'srun' please ask your administrator to install the package 'slurm-client'
So when CUDA out of memory
, what should I do to solve this problem. Looking forward to your help!
Sorry for the late reply. I haven't looked at this repo for a long time as I was also just to reproduce this method but not the author of the paper.
I think installing srun toolkit on your server could solve this problem. Searching is memory-consuming, you can use torch checkpoint or just simply use a smaller batch size to alleviate this problem.