MolDiff icon indicating copy to clipboard operation
MolDiff copied to clipboard

Training stop unexpected at iteration 3618

Open lajictw opened this issue 1 year ago • 2 comments

Hi. I really like this work that uses the molecular fragmentation idea. But when I try to reproduce it, at the 3618th iteration, the iterator says that there are no more elements. Then the training stops. I only added the tqdm module to the loop for observing the training process on your code. In addition I modified the batch_size parameter to equal 64 and the num_workers parameter to equal 8. The error message is attached below. I think it's some kind of error in the setup. But I'm not quite sure. Please reply to me at your convenience if possible. Thanks!

`2023-10-23 01:37:14,919 :: train :: INFO] [Train] Iter 3617 | loss: 1.285635 | loss_pos: 1.181161 | loss_node: 0.091807 | loss_edge: 0.012668 Training: 3%|██████▏ | 3617/110000 [1:02:57<29:16:15, 1.01it/s][2023-10-23 01:37:15,320 :: train :: INFO] [Train] Iter 3618 | loss: 1.513661 | loss_pos: 1.436085 | loss_node: 0.067695 | loss_edge: 0.009880 Training: 3%|██████▏ | 3618/110000 [1:02:58<30:51:47, 1.04s/it] Traceback (most recent call last): File "D:\PaperCode\MolDiff.\utils\train.py", line 51, in inf_iterator yield iterator.next() File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next data = self._next_data() File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 1318, in _next_data raise StopIteration StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\PaperCode\MolDiff\scripts\train_drug3d.py", line 172, in train(it) File "D:\PaperCode\MolDiff\scripts\train_drug3d.py", line 90, in train batch = next(train_iterator).to(args.device) File "D:\PaperCode\MolDiff.\utils\train.py", line 53, in inf_iterator iterator = iterable.iter() File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 441, in iter return self._get_iterator() File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 1042, in init w.start() File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\context.py", line 327, in _Popen return Popen(process_obj) File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\popen_spawn_win32.py", line 93, in init reduction.dump(process_obj, to_child) File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'Environment' object

(GNN1) D:\PaperCode\MolDiff>Traceback (most recent call last): File "", line 1, in File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input`

lajictw avatar Oct 23 '23 17:10 lajictw