TCMR_RELEASE
TCMR_RELEASE copied to clipboard
DataLoader worker (pid 2991): Bus error.
Hi, Thank you for your great work! When running your code, I got this error:
`Running TCMR on each person tracklet... 0%| | 0/5 [00:00<?, ?it/s]ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). 0%| | 0/5 [00:02<?, ?it/s] Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 779, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/usr/lib/python3.6/multiprocessing/queues.py", line 104, in get if not self._poll(timeout): File "/usr/lib/python3.6/multiprocessing/connection.py", line 257, in poll return self._poll(timeout) File "/usr/lib/python3.6/multiprocessing/connection.py", line 414, in _poll r = wait([self], timeout) File "/usr/lib/python3.6/multiprocessing/connection.py", line 911, in wait ready = selector.select(timeout) File "/usr/lib/python3.6/selectors.py", line 376, in select fd_event_list = self._poll.poll(timeout) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 2991) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/data/meilin/TCMR/demo.py", line 377, in
Process finished with exit code 1 ` It seems the num_workers need to be adjusted, but I found it's no use... Can you guide me a little bit for this! Thank you!
the shared memory error message usually indicates that RAM (cpu memory) is insufficient. I remember that normally the experiment took around 50GB.
Try increasing RAM (more/biggr RAM cards), or though not recommended, make swap memory in the disk
Hi, thank you very much for your timely reply.
I have tried this, but it still makes the same error. My memory is 61GB and share memory is 64MB. I use 21 CPU and 1 GPU. Should I continue increasing them?
first check the exaxt required memory by htop. I guess at least 128g is safe!