crnn.pytorch icon indicating copy to clipboard operation
crnn.pytorch copied to clipboard

ConnectionResetError: [Errno 104] Connection reset by peer

Open little2Rabbit opened this issue 6 years ago • 4 comments

Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f03df4789e8>> Traceback (most recent call last): File "/root/anaconda3/envs/jiangxiluning-train3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 399, in del self._shutdown_workers() File "/root/anaconda3/envs/jiangxiluning-train3.5/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers self.worker_result_queue.get() File "/root/anaconda3/envs/jiangxiluning-train3.5/lib/python3.5/multiprocessing/queues.py", line 337, in get return ForkingPickler.loads(res) File "/root/anaconda3/envs/jiangxiluning-train3.5/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd fd = df.detach() File "/root/anaconda3/envs/jiangxiluning-train3.5/lib/python3.5/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/root/anaconda3/envs/jiangxiluning-train3.5/lib/python3.5/multiprocessing/reduction.py", line 181, in recv_handle return recvfds(s, 1)[0] File "/root/anaconda3/envs/jiangxiluning-train3.5/lib/python3.5/multiprocessing/reduction.py", line 152, in recvfds msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_LEN(bytes_size)) ConnectionResetError: [Errno 104] Connection reset by peer epoch:3,step:65,Test loss:0.0013998962240293622,accuracy:0.0,train loss:0.0014686216600239277 The step 66,last lost:0.0014593754895031452, current: 0.0013998962240293622,save model!

little2Rabbit avatar Sep 27 '18 09:09 little2Rabbit

have you solved this problem? I have the same problem now

wzr1201 avatar Mar 04 '19 03:03 wzr1201

the same problem +1

guochangrong avatar Apr 02 '19 09:04 guochangrong

the same problem +1!

Shualite avatar Jul 19 '19 07:07 Shualite

I have modified the parameter in line 22 of train.py to 0, no error is reported during training. Change parser.add_argument('--workers', type=int, help='number of data loading workers', default=2) to parser.add_argument('--workers', type=int, help='number of data loading workers', default=0).

HuangGuoHui110 avatar Nov 25 '19 08:11 HuangGuoHui110