GraphCL icon indicating copy to clipboard operation
GraphCL copied to clipboard

Error information when I run the, gsimclr.py --DS ENZYMES --lr 0.01 --local --num-gc-layers 3 --aug random4 --seed 0

Open Austinzhenghua opened this issue 3 years ago • 13 comments

600 1

lr: 0.01 num_features: 1 hidden_dim: 32 num_gc_layers: 3

/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gsimclr.py", line 190, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gin.py", line 76, in get_embeddings x, _ = self.forward(x, edge_index, batch) File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gin.py", line 52, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in call_impl return forward_call(*input, **kwargs) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/gin_conv.py", line 64, in forward out = self.propagate(edge_index, x=x, size=size) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 253, in propagate out = self.aggregate(out, **aggr_kwargs) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 288, in aggregate reduce=self.aggr) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_scatter/scatter.py", line 153, in scatter return scatter_sum(src, index, dim, out, dim_size) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_scatter/scatter.py", line 21, in scatter_sum return out.scatter_add(dim, index, src) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Can anyone help me with what wrong with the algorithm or the enviroment?

the environment as follows:

Jinja2 3.0.1 3.0.1
MarkupSafe 2.0.1 2.0.1
Pillow 8.2.0 8.2.0
PySocks 1.7.1 1.7.1
brotlipy 0.7.0 0.7.0
certifi 2020.6.20 2021.5.30
cffi 1.14.5 1.14.5
chardet 4.0.0 4.0.0
cryptography 3.4.7 3.4.7
cycler 0.10.0 0.10.0
decorator 4.4.2 5.0.9
googledrivedownloader 0.4 0.4
idna 2.10 3.2
joblib 1.0.1 1.0.1
kiwisolver 1.3.1 1.3.1
matplotlib 3.4.2 3.4.2
mkl-fft 1.3.0 1.3.0
mkl-random 1.2.1 1.2.2
mkl-service 2.3.0 2.4.0
networkx 2.5.1 2.6rc2
numpy 1.20.2 1.21.0
olefile 0.46 0.47.dev4
pandas 1.2.5 1.3.0rc1
pip 21.1.2 21.1.3
pyOpenSSL 20.0.1 20.0.1
pycparser 2.20 2.20
pyparsing 2.4.7 3.0.0b2
python-dateutil 2.8.1 2.8.1
python-louvain 0.15 0.15
pytz 2021.1 2021.1
requests 2.25.1 2.25.1
scikit-learn 0.24.2 0.24.2
scipy 1.6.2 1.7.0
seaborn 0.11.0 0.11.1
setuptools 52.0.0.post20210125 57.0.0
six 1.16.0 1.16.0
threadpoolctl 2.1.0 2.1.0
torch 1.9.0 1.9.0
torch-cluster 1.5.9 1.5.9
torch-geometric 1.7.2 1.7.2
torch-scatter 2.0.7 2.0.7
torch-sparse 0.6.10 0.6.10
torch-spline-conv 1.2.1 1.2.1
torchaudio 0.9.0a0+33b2469 0.9.0
torchvision 0.10.0 0.10.0
tornado 6.1 6.1
tqdm 4.61.1 4.61.1
typing-extensions 3.7.4.3 3.10.0.0
urllib3 1.26.6 1.26.6
wheel 0.36.2 0.36.2

Austinzhenghua avatar Jun 29 '21 07:06 Austinzhenghua

Hi @Austinzhenghua,

Thanks for your feedback. Does torch_geometric==1.7.2 not work for you? You can take a try version 1.6.0/1.6.1 for this experiment.

yyou1996 avatar Jun 29 '21 13:06 yyou1996

Hi. can I have your we-chat to ask you some more detailed questions? hua zheng @.*** 签名由 网易邮箱大师 定制 On 06/29/2021 21:38, Yuning You wrote: Hi @Austinzhenghua, Thanks for your feedback. Does torch_geometric==1.7.2 not work for you? You can take a try version 1.6.0/1.6.1 for this experiment. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Austinzhenghua avatar Jun 29 '21 14:06 Austinzhenghua

Just for a test, are you capable to run this https://github.com/fanyun-sun/InfoGraph/tree/master/unsupervised which the unsupervised_TU experiment is built on?

yyou1996 avatar Jun 29 '21 18:06 yyou1996

Just for a test, are you capable to run this https://github.com/fanyun-sun/InfoGraph/tree/master/unsupervised which the unsupervised_TU experiment is built on?

Yes, I can run this algorithm, but it seems it didn't use GPU to train. The error above did cause by the version of torch_geometric. Can you run it in your computrer? Thanks a lot!

Austinzhenghua avatar Jun 30 '21 05:06 Austinzhenghua

Traceback (most recent call last): File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gsimclr.py", line 189, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 77, in get_embeddings x, _ = self.forward(x, edge_index, batch) File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 52, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/gin_conv.py", line 63, in forward out = self.propagate(edge_index, x=x, size=size) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate kwargs) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 158, in collect j if arg[-2:] == '_j' else i) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 127, in lift return src.index_select(self.node_dim, index) RuntimeError: index out of range: Tried to access index 4324 out of table with 4323 rows. at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

I run it on the CPU get this error.

Austinzhenghua avatar Jun 30 '21 06:06 Austinzhenghua

image image

I find the shape of x is different from your algorithm and infograph. the first one is infograph.

Austinzhenghua avatar Jun 30 '21 07:06 Austinzhenghua

It works well on my machine. What is the command u use? Please take a look at readme https://github.com/Shen-Lab/GraphCL/tree/master/unsupervised_TU#readme.

yyou1996 avatar Jul 01 '21 13:07 yyou1996

Traceback (most recent call last): File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gsimclr.py", line 189, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 77, in get_embeddings x, _ = self.forward(x, edge_index, batch) File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 52, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/gin_conv.py", line 63, in forward out = self.propagate(edge_index, x=x, size=size) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate kwargs) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 158, in collect j if arg[-2:] == '_j' else i) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 127, in lift return src.index_select(self.node_dim, index) RuntimeError: index out of range: Tried to access index 4324 out of table with 4323 rows. at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

I run it on the CPU get this error.

I have the same error. Have you fixed it?

ztk1996 avatar Sep 06 '21 09:09 ztk1996

Hi @ztk1996,

I remember I tested the command and it worked ok in my machine. Would you also share your environment and the command you run?

yyou1996 avatar Sep 07 '21 01:09 yyou1996

Hi @ztk1996,

I remember I tested the command and it worked ok in my machine. Would you also share your environment and the command you run?

Thanks for your reply. Error information when I run "./go.sh 1 AIDS subgraph" on CPU is as follows.

  • for seed in 0 1 2 3 4
  • CUDA_VISIBLE_DEVICES=1
  • python gsimclr.py --DS AIDS --lr 0.01 --local --num-gc-layers 3 --aug subgraph --seed 0 dataset length: 2000 1 ================ lr: 0.01 num_features: 1 hidden_dim: 32 num_gc_layers: 3 ================ Traceback (most recent call last): File "gsimclr.py", line 188, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/zt/GraphCL/unsupervised_TU/gin.py", line 89, in get_embeddings x, _ = self.forward(x, edge_index, batch) File "/home/zt/GraphCL/unsupervised_TU/gin.py", line 62, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch_geometric/nn/conv/gin_conv.py", line 64, in forward out = self.propagate(edge_index, x=x, size=size) File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate coll_dict = self.collect(self.user_args, edge_index, size, File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 157, in collect data = self.lift(data, edge_index, File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 127, in lift return src.index_select(self.node_dim, index) IndexError: index out of range in self

torch: 1.7.0 torch-geometric: 1.7.2

ztk1996 avatar Sep 07 '21 02:09 ztk1996

@ztk1996

Please take a try to run with torch-geometric==1.6.0 and on GPU. Since both of you use torch-geometric>=1.7.0 and on CPU, I guess it might be the source of error.

yyou1996 avatar Sep 07 '21 02:09 yyou1996

@ztk1996

Please take a try to run with torch-geometric==1.6.0 and on GPU. Since both of you use torch-geometric>=1.7.0 and on CPU, I guess it might be the source of error.

I try to run with torch_geometric==1.6.0, pytorch==1.7.0 and on GPU. And the error information is as follows.

  • for seed in 0 1 2 3 4
  • CUDA_VISIBLE_DEVICES=0
  • python gsimclr.py --DS AIDS --lr 0.01 --local --num-gc-layers 3 --aug subgraph --seed 0 dataset length: 2000 1 ================ lr: 0.01 num_features: 1 hidden_dim: 32 num_gc_layers: 3 ================ Traceback (most recent call last): File "gsimclr.py", line 188, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/zt/GraphCL/unsupervised_TU/gin.py", line 89, in get_embeddings x, _ = self.forward(x, edge_index, batch) File "/home/zt/GraphCL/unsupervised_TU/gin.py", line 62, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch_geometric/nn/conv/gin_conv.py", line 69, in forward return self.nn(out) File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward return F.linear(input, self.weight, self.bias) File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/functional.py", line 1690, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle) /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed.

Besides, when I run with torch_geometric==1.6.0, pytorch==1.7.0 and on CPU. The error information is the same as run with torch_geometric==1.7.2.

ztk1996 avatar Sep 07 '21 06:09 ztk1996

@ztk1996

My impression is that the version of torch_geometric and pytorch should be consistent (https://github.com/rusty1s/pytorch_geometric)? If using torch_geometric==1.6 I would also use pytorch==1.6. Please notify me if this also not works. Thanks.

yyou1996 avatar Sep 07 '21 16:09 yyou1996