DSN
DSN copied to clipboard
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly
Hi,thx for your code, it is a good job,but i have meet some problems when i begin to train it.Hope you can give some advices to me thanks ~ And, there are my specific problems as follows: #my training environment is python:3.7.9 with pytorch 1.7.1 and cudnn 11.0. I learn your training environment is python2 with pytorch 0.4.0,so i revised some code like format of print to python3, and to solve the alignment problem of shape of mnist and mnist-m ,i revised code of transfrom part to align it.
After those little revises , the code get work. But after one epoch, and raising this prolems as follows:
File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 885, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 17440, 2128, 16900, 13076, 17672, 15916) exited unexpectedly
To solve this problem, i set the numwork =0 or 1,and revise the value of batch size from 64 to 32,16,8, even 1,but it still raise this problem,after first one epoch. i don't know how to solve it right now,may you give me some helps? thx !!
I have solve this problem in my other server with environment : pytorch 1.6.0 , torchvision 0.7.0 and ubuntu 16.04 OS
The method to deal this problem as follows: When i exchange my training environment to this server, i found the same situation like someones asked before that the shape of mnist and mnist-m are not align,but this code was work in my prior server. So, i deduce that this problem may raised by the code of dataloader part, i revised this code as follows,which help me solved this problem above.
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x.repeat(3,1,1)), #Add this code to reshape channel from 1 to 3
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])
And,a new problem is raised that the loss of target dif and source dif are always keep 0,it is abnormal.
It looks like the model do not have any back propagation.
@FLAWLESSJade I got the same issue, did you solve it?