hgraph2graph icon indicating copy to clipboard operation
hgraph2graph copied to clipboard

Getting error when run vae_train.py

Open SejeongPark8354 opened this issue 4 years ago • 4 comments

First of all, Thank you for your great research on molecule generation. Nowadays, I am training my ZINC datasets with your vae_train.py (in generation folder). When I run the code, I got the error like below. This error occur occasionally. I think it depends on the batch. Is there any solution for this problem?

  warnings.warn(warning.format(ret))
Model #Params: 160850K
[50] Beta: 0.100, KL: 19.11, loss: 57.167, Word: 10.76, 52.60, Topo: 80.77, Assm: 56.73, PNorm: 175.70, GNorm: 18.64
[100] Beta: 0.100, KL: 9.08, loss: 42.075, Word: 14.69, 59.69, Topo: 93.39, Assm: 75.03, PNorm: 236.81, GNorm: 14.60
[150] Beta: 0.100, KL: 9.66, loss: 39.316, Word: 16.71, 62.58, Topo: 96.62, Assm: 77.06, PNorm: 293.82, GNorm: 17.42
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [102,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [103,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [104,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [105,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [106,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [107,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [108,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [109,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [110,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [111,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [112,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [113,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [114,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [115,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [32,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "vae_train.py", line 81, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/hgnn.py", line 88, in forward
    root_vecs, tree_vecs, _, graph_vecs = self.encoder(tree_tensors, graph_tensors)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/encoder.py", line 130, in forward
    hatom,_ = self.graph_encoder(*tensors)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/encoder.py", line 30, in forward
    h = self.rnn(fmess, bgraph)
  File "/home/sejeong/anaconda3/envs/PSJ/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/rnn.py", line 105, in forward
    h,c = self.LSTM(fmess, h_nei, c_nei)
  File "/home/sejeong/hgraph2graph/generation/poly_hgraph/rnn.py", line 92, in LSTM
    c = i * u + (f * c_nei).sum(dim=1)
RuntimeError: CUDA error: device-side assert triggered

SejeongPark8354 avatar Dec 09 '20 10:12 SejeongPark8354

I am also getting the above issue - Did you manage to find a fix @SejeongPark8354 ?

jks17 avatar Mar 01 '21 22:03 jks17

getting a very similar issue when running train_generator.py:

Namespace(anneal_iter=25000, anneal_rate=0.9, atom_vocab=<hgraph.vocab.Vocab object at 0x000001C10639ED48>, batch_size=20, clip_norm=5.0, depthG=15, depthT=15, diterG=3, diterT=1, dropout=0.0, embed_size=250, epoch=20, hidden_size=125, kl_anneal_iter=2000, latent_size=32, load_model=None, lr=0.001, max_beta=1.0, print_iter=50, rnn_type='LSTM', save_dir='ckpt/cyclic_truncated_pretrained', save_iter=5000, seed=7, step_beta=0.001, train='train_processed/cyclic_truncated_processed/', vocab='data/chembl/cyclic_peptide_vocab_truncated.txt', warmup=10000)
C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Model #Params: 1318K
  0%|▏                                                                              | 2/1000 [00:32<4:01:50, 14.54s/it]C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [40,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [41,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [42,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [43,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [44,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [45,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [46,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [47,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [48,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [49,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [50,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [51,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [52,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [53,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [54,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [55,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [56,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [57,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [58,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [146,0,0], thread: [59,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [44,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [45,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [46,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [47,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [48,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [49,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [50,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [51,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [52,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [53,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [54,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [55,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [56,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [57,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [58,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [59,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [60,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [61,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [62,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\cuda\Indexing.cu:975: block: [148,0,0], thread: [63,0,0] Assertion “srcIndex < srcSelectDimSize” failed.
  0%|▏                                                                              | 2/1000 [00:35<4:56:07, 17.80s/it]
Traceback (most recent call last):
  File "train_generator.py", line 92, in <module>
    loss, kl_div, wacc, iacc, tacc, sacc = model(*batch, beta=beta)
  File "C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\hgnn.py", line 55, in forward
    root_vecs, tree_vecs, _, graph_vecs = self.encoder(tree_tensors, graph_tensors)
  File "C:\Users\Marshall\Anaconda3\envs\hgraph-rdkit\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\encoder.py", line 129, in forward
    tensors = self.embed_graph(graph_tensors)
  File "C:\Users\Marshall\hgraph2graph-master\hgraph\encoder.py", line 114, in embed_graph
    fpos = self.E_apos.index_select(index=fmess[:, 3], dim=0)
RuntimeError: CUDA error: device-side assert triggered

marshallcase avatar Aug 31 '22 21:08 marshallcase

Actually, I think I figured it out. There's a parameter defined in mol_graph.py , MAX_POS = 20, which limits the E_apos matrix, E_pos matrix, and subsequently when in the enconder, the f_mess matrix will be out of index which is why you get the error.

I think it's an issue of molecule size and graph complexity - in the paper, there's a subscript: "The number of possible attachments are limited because the number of attaching atoms between two motifs is small and the attaching points must be consecutive.3

3In our experiments, the number of possible attachments are usually less than 20 for polymers and small molecules."

marshallcase avatar Aug 31 '22 22:08 marshallcase

I agree with the above person's advice. I first use "os.environ['CUDA_LAUNCH_BLOCKING'] = '1'" to locate the bug, I find there are some problem with "fpos = self.E_apos.index_select(index=fmess[:, 3], dim=0)". And then I use the slice to locate where the error is,I find the max number of fmess[:,3] is 22 while self.E_apos only has 20 dims. So I increase the MAX_POS in mol_graph.py and solve this problem. I think the operation would not affect the models, maybe waste some memory.

Bunnybeibei avatar Feb 21 '23 01:02 Bunnybeibei