topaz icon indicating copy to clipboard operation
topaz copied to clipboard

Topaz training error in multiprocessing data loader

Open tbepler opened this issue 1 year ago • 0 comments

Discussed in https://github.com/tbepler/topaz/discussions/172

Originally posted by wuhucryoem July 24, 2023 When I do a topaz training, it show me there haven't some file or directory, but don't show me the concrete file or directory.Like that:

Traceback (most recent call last): File "/home/amax/miniconda3/envs/topaz/bin/topaz", line 33, in sys.exit(load_entry_point('topaz-em==0.2.5', 'console_scripts', 'topaz')()) File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 148, in main args.func(args) File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 695, in main , save_prefix=save_prefix, use_cuda=use_cuda, output=output) File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 577, in fit_epochs , use_cuda=use_cuda, output=output) File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 552, in fit_epoch for X,Y in data_iterator: File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data idx, data = self._get_data() File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1152, in _get_data success, data = self._try_get_data() File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd fd = df.detach() File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory

tbepler avatar Jul 24 '23 06:07 tbepler