GraphCL icon indicating copy to clipboard operation
GraphCL copied to clipboard

Unsupervised learning with self created dataset

Open LA11131110128 opened this issue 3 years ago • 1 comments

I have tried my dataset on your unsupervised learning framework, which num_of_edge will exceed 10^6. When I load the data, there is an assertion error.


loading GCC 7.3.1 based on SCL Developer Toolset 7


loading CUDA 10.1 with cuDNN / NCCL based on cntr cuda:10.1-cudnn7-devel-centos7

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2, IndexIsMajor = true]: block: [21,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2, IndexIsMajor = true]: block: [21,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2, IndexIsMajor = true]: block: [21,0,0], thread: [51,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2, IndexIsMajor = true]: block: [20,0,0], thread: [88,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2, IndexIsMajor = true]: block: [20,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2, IndexIsMajor = true]: block: [20,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed. Processing... Done! 5264 1

lr: 0.01 num_features: 1 hidden_dim: 32 num_gc_layers: 4

dataset_num_classes: 7 Traceback (most recent call last): File "gsimclr.py", line 189, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/u8411596/GraphCL-master/unsupervised_TU/gin.py", line 83, in get_embeddings x, _ = self.forward(x, edge_index, batch) File "/home/u8411596/GraphCL-master/unsupervised_TU/gin.py", line 56, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/u8411596/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/u8411596/.conda/envs/py36/lib/python3.6/site-packages/torch_geometric/nn/conv/gin_conv.py", line 67, in forward out += (1 + self.eps) * x_r RuntimeError: CUDA error: device-side assert triggered

I am wondering the learning framework may have length of data limitation and want some suggestion from you to solve this problem. Thank you!

LA11131110128 avatar Sep 17 '22 19:09 LA11131110128

Hi @LA11131110128,

It looks like the error comes from the mismatch between GNN and your customized data (though I am not clear where exactly it is). I would suggest to check the defined GIN architecture (input_node_dimension, etc) and confirming it matches your defined data.

Also maybe print out the shapes of x, edge_index to see whether the maximum edge index exceeds the node number.

yyou1996 avatar Sep 18 '22 17:09 yyou1996