PIDNet icon indicating copy to clipboard operation
PIDNet copied to clipboard

Training hangs in the middle

Open fschvart opened this issue 1 year ago • 0 comments

Hi,

I've been having this issue on multiple machines, when I start training a model (custom dataset), the training would just hang in the middle, without any error. It just stops working, the GPU temperature goes down and no progress in the epoch/iterations is observed.

Sometimes it happens after 20 epochs and once it managed to get to 150 and then stopped. Have anyone seen something similar? I suspect it might be related to my CUDA/PyTorch version (?) what versions would you recommend?

Thanks!

fschvart avatar May 18 '23 23:05 fschvart