Auto-ReID-Fast icon indicating copy to clipboard operation
Auto-ReID-Fast copied to clipboard

Running with distributed

Open Liu-pf opened this issue 3 years ago • 1 comments

Hello, Duan.I am trying to repo this code .There are some questions when I use distributed. I can't use this command: srun -n your_node_nums --gres gpu:gpunums -p your_partition Error: The program 'srun' is currently not installed. To run 'srun' please ask your administrator to install the package 'slurm-client' So when CUDA out of memory, what should I do to solve this problem. Looking forward to your help!

Liu-pf avatar Nov 11 '21 10:11 Liu-pf

Sorry for the late reply. I haven't looked at this repo for a long time as I was also just to reproduce this method but not the author of the paper.

I think installing srun toolkit on your server could solve this problem. Searching is memory-consuming, you can use torch checkpoint or just simply use a smaller batch size to alleviate this problem.

duanyiqun avatar Oct 31 '22 04:10 duanyiqun