DSS-pytorch icon indicating copy to clipboard operation
DSS-pytorch copied to clipboard

RuntimeError: DataLoader worker (pid 16126) exited unexpectedly with exit code 1

Open Kyle0936 opened this issue 6 years ago • 2 comments

Sorry to bother you but I had another issue when I tried to train with my own data sets: python main.py --mode='train' --train_path='data/images' --label_path='data/ground_truth_mask' --batch_size=8 --visdom=False

It seemed to run correctly as first, but then came the error:

...... The number of parameters: 62238175 /Users/kyle/Documents/MATLAB/DSS-py/solver.py:138: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_. utils.clip_grad_norm(self.net.parameters(), self.config.clip_gradient) /Users/kyle/Documents/MATLAB/DSS-py/solver.py:141: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number loss_epoch += loss.cpu().data[0] /Users/kyle/Documents/MATLAB/DSS-py/solver.py:143: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number epoch, self.config.epoch, i, iter_num, loss.cpu().data[0])) epoch: [0/500], iter: [0/3], loss: [4.8447] /Users/kyle/Documents/MATLAB/DSS-py/solver.py:145: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number error = OrderedDict([('loss:', loss.cpu().data[0])]) epoch: [0/500], iter: [1/3], loss: [4.8428] epoch: [0/500], iter: [2/3], loss: [4.8411] thread_monitor No such process in pthread_detach Traceback (most recent call last): File "main.py", line 86, in main(config) File "main.py", line 25, in main train.train() File "/Users/kyle/Documents/MATLAB/DSS-py/solver.py", line 127, in train for i, data_batch in enumerate(self.train_loader): File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 280, in next idx, batch = self._get_batch() File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch return self.data_queue.get() File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/queues.py", line 335, in get res = self._reader.recv_bytes() File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/Users/kyle/anaconda3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) File "/Users/kyle/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 16126) exited unexpectedly with exit code 1.

I really appreciate your generous help!

Kyle0936 avatar Jul 24 '18 21:07 Kyle0936

please check your own image and label directory:is there are some "not picture" file in this directory (I try the training in my computer, without the error you meet.)

AceCoooool avatar Jul 25 '18 02:07 AceCoooool

screen shot 2018-07-25 at 1 41 20 am screen shot 2018-07-25 at 1 41 39 am

Here are my image set and ground truth set. I also have used "ls -a" to check and have removed '.DS_Store'. The only not picture files left are '.' and '..', which clearly should not be deleted. Are there any requirements for the names or forms of images?

Kyle0936 avatar Jul 25 '18 06:07 Kyle0936