c-swm
c-swm copied to clipboard
Data Loading Error
When I try to run the code provided here, I end up hitting a RuntimeError here: https://github.com/tkipf/c-swm/blob/e944b24bcaa42d9ee847f30163437a50f0237aa0/train.py#L104 when running your Shapes-2D build/train functions: python train.py --dataset data/shapes_train.h5 --encoder small --name shapes
Specifically, the error says: RuntimeError: DataLoader worker is killed by signal: Killed
. It seems to be coming from the fact that the dataloader is having trouble multi-processing the training set loading. But when I look through your utils files, I am not seeing why this error would exist.
This same error has occurred on both a windows machine as well as a linux machine.
Here's the full trace:
Traceback (most recent call last):
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
data = self.data_queue.get(timeout=timeout)
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/multiprocessing/queues.py", line 104, in get
if not self._poll(timeout):
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/multiprocessing/connection.py", line 414, in _poll
r = wait([self], timeout)
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/multiprocessing/connection.py", line 921, in wait
ready = selector.select(timeout)
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 19014) is killed by signal: Killed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 104, in <module>
obs = train_loader.__iter__().next()[0]
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 804, in __next__
idx, data = self._get_data()
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 771, in _get_data
success, data = self._try_get_data()
File "/home/aadharna/miniconda3/envs/cswm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 737, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 19012, 19014) exited unexpectedly
(cswm) aadharna@penguin:~/PycharmProjects/c-swm$
I'll continue poking around and update when I find the root cause.
Same situation, maybe the RAM is too small to run the train?
YES, RAM PROBLEM