HSG icon indicating copy to clipboard operation
HSG copied to clipboard

I have a problem during training

Open RobinLiuZX opened this issue 1 year ago • 5 comments

question

I used a GPU card on the server to train with your open source code, and the result was stuck in this training progress. I strictly configured the environment you gave, using pytorch==1.12, python==3.7.9, may I ask you What are your thoughts on this issue?

RobinLiuZX avatar Aug 09 '22 04:08 RobinLiuZX

If you are using a GPU card, have you tried to set GPUS=0 here?

Also, have you checked the occupancy of GPU memory? It's also likely that OOM occurred and the process halted.

twke18 avatar Aug 09 '22 04:08 twke18

Thank you for your reply. My GPU memory is NVIDIA-A100 with 40960MB. Running according to the batch size of 16 you set, it only takes up memory of 9683MB, and I set the GPUS=0.But my problem is still the same.

RobinLiuZX avatar Aug 09 '22 06:08 RobinLiuZX

encounter the same problem...

Lan-MC avatar Sep 03 '22 13:09 Lan-MC

@RobinLiuZX , do you still have the problem?

@Lan-MC , are you also using a single GPU for training?

twke18 avatar Sep 03 '22 15:09 twke18

I use 2 GPUs for training.

Lan-MC avatar Sep 04 '22 03:09 Lan-MC