crnn.pytorch icon indicating copy to clipboard operation
crnn.pytorch copied to clipboard

OSError: [Errno 4] Interrupted system call Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f1ebbb37f50>> ignored

Open fujingling opened this issue 6 years ago • 2 comments

正常的跑了几个epoch以后,我总会遇到下面这个报错,每次都会遇到,哪位大神帮忙看一下! I can run train.py for some epoch, and the loss always keep decay, but when run a while ,it always Interrupt, the following is the error informations, How can I solve this problem!

Traceback (most recent call last):   File "train.py", line 198, in     cost = trainBatch(crnn, criterion, optimizer)   File "train.py", line 173, in trainBatch     data = train_iter.next()   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 280, in next     idx, batch = self._get_batch()   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch     return self.data_queue.get()   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/multiprocessing/queues.py", line 378, in get     return recv()   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv     return pickle.loads(buf)   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/pickle.py", line 1388, in loads     return Unpickler(file).load()   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/pickle.py", line 864, in load     dispatchkey   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/pickle.py", line 1139, in load_reduce     value = func(*args)   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 68, in rebuild_storage_fd     fd = multiprocessing.reduction.rebuild_handle(df)   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle     new_handle = recv_handle(conn)   File "/export/docker/JXQ-23-46-110.h.chinabank.com.cn/fujingling/conda/envs/crnn/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle     return _multiprocessing.recvfd(conn.fileno()) OSError: [Errno 4] Interrupted system call Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f1ebbb37f50>> ignored

fujingling avatar Nov 26 '18 08:11 fujingling

I met the same issue , did you find out where the problem is?

encounter1997 avatar Dec 26 '18 10:12 encounter1997

Me too, did you solve it? @encounter1997 @fujingling

huanhuancao avatar Jan 06 '19 07:01 huanhuancao