YOLOX
YOLOX copied to clipboard
RuntimeError: DataLoader worker (pid(s) 197) exited
hi i set the self.data_num_workers = 4 and train command :python ${workspace}/train.py -f ${train_data_dir}/yolox_voc_s.py -d 1 -b 8 -c ${weights_data_dir}/yolox_s.pth
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
2022-07-06 10:48:08 | ERROR | yolox.core.launch:98 - An error has been caught in function 'launch', process 'MainProcess' (36), thread 'MainThread' (139940411148096):
Traceback (most recent call last):
............
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
│ └ <function _MultiProcessingDataLoaderIter._next_data at 0x7f457ba23b70>
└ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f4574422550>
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
│ │ └ <function _MultiProcessingDataLoaderIter._get_data at 0x7f457ba23ae8>
│ └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f4574422550>
└ 5
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1142, in _get_data
success, data = self._try_get_data()
│ │ └ <function _MultiProcessingDataLoaderIter._try_get_data at 0x7f457ba23a60>
│ └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f4574422550>
└ False
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1003, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
└ '197'
RuntimeError: DataLoader worker (pid(s) 197) exited unexpectedly
when i change it from 4to 0,the train can go on , but it very slow : 2022-07-06 11:00:13 | INFO | yolox.core.trainer:261 - epoch: 8/500, iter: 8730/13381, mem: 1960Mb, iter_time: 1.418s, data_time: 1.261s, total_loss: 4.1, iou_loss: 2.0, l1_loss: 0.0, conf_loss: 1.2, cls_loss: 0.8, lr: 2.499e-03, size: 512, ETA: 94 days, 20:58:37
version: python :3.6.9 pytorch:1.10.1 cuda version:10.2 driver vision:460.56 nvidia gpu:2080ti yolox:0.3.0
You log says: ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Reduce the data_num_workers
in exp might help you. 2 is suggested and 0 should be your last choice.
Is it the code you run in docker? You may need to increase the shared memory (shm) of docker settings