HSG
HSG copied to clipboard
I have a problem during training
I used a GPU card on the server to train with your open source code, and the result was stuck in this training progress. I strictly configured the environment you gave, using pytorch==1.12, python==3.7.9, may I ask you What are your thoughts on this issue?
If you are using a GPU card, have you tried to set GPUS=0 here?
Also, have you checked the occupancy of GPU memory? It's also likely that OOM occurred and the process halted.
Thank you for your reply. My GPU memory is NVIDIA-A100 with 40960MB. Running according to the batch size of 16 you set, it only takes up memory of 9683MB, and I set the GPUS=0.But my problem is still the same.
encounter the same problem...
@RobinLiuZX , do you still have the problem?
@Lan-MC , are you also using a single GPU for training?
I use 2 GPUs for training.