Random errors when loading OGB datasets
🐛 Describe the bug
Receive random error message IndexError: index 242823520 is out of bounds for dimension 0 with size 123718280 error when loading OGB-Products using
dataset = PygNodePropPredDataset(name ='ogbn-products', transform = T.ToSparseTensor())
This error doesn't happen every time... sometimes I have it, sometimes I don't. This has happened after I updated the PyTorch and PyTorch-Geometric versions recently ...
I reinstalled using pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cu113.html --upgrade --force-reinstall but still didn't work.
Environment
- PyG version: 2.0.4
- PyTorch version: 1.11.0+cu113
- OS: Ubuntu 18.04
- Python version: 3.8.5
- CUDA/cuDNN version:
- How you installed PyTorch and PyG (
conda,pip, source): pip - Any other relevant information (e.g., version of
torch-scatter):
Installed by command pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
Can you try to remove the processed/ data of the OGB datasets when using a different PyG version? Hopefully, this will already resolve your issues.
Sorry for the late reply. The issue is still not fixed. I tried another version of pytorch=1.12.1+cu113 but still not work.
I found this issue happens as long as the dataset size is large. For example, when I am using Reddit dataset from PyG, it gives me the following error when creating an torch_sparse.SparseTensor:
~/GNN_models.py in get_feats(self, data, y_pred)
73 def get_feats(self, data, y_pred=None):
74
---> 75 adj_t = torch_sparse.SparseTensor(
76 row = data.edge_index[0].long(),
77 col = data.edge_index[1].long(),
~/anaconda3/lib/python3.8/site-packages/torch_sparse/tensor.py in __init__(self, row, rowptr, col, value, sparse_sizes, is_sorted, trust_data)
24 trust_data: bool = False,
25 ):
---> 26 self.storage = SparseStorage(
27 row=row,
28 rowptr=rowptr,
~/anaconda3/lib/python3.8/site-packages/torch_sparse/storage.py in __init__(self, row, rowptr, col, value, sparse_sizes, rowcount, colptr, colcount, csr2csc, csc2csr, is_sorted, trust_data)
66 assert rowptr.numel() - 1 == M
67 elif row is not None and row.numel() > 0:
---> 68 assert trust_data or int(row.max()) < M
69
70 N: int = 0
AssertionError:
I also have saw the above error before when using OGB, but the error pops up randomly ...
Do you have a reproducible example on reddit?
Yes, the following is a very simple example. The error happens randomly ... I would say 50% of chance I could see this error ...
from torch_geometric.datasets import Reddit
path = osp.join('./data', name)
dataset = Reddit(root=path)
data = dataset[0]
from torch_geometric.loader import ShaDowKHopSampler
train_loader = ShaDowKHopSampler(data, depth=2, num_neighbors=5, batch_size=256, num_workers=10, shuffle=True)
Since ShaDowKHopSampler is calling SparseTensor, it sometime give me this error ...
It also has these types of random errors for Reddit dataset ...
~/anaconda3/lib/python3.8/site-packages/torch_geometric/loader/shadow.py in __init__(self, data, depth, num_neighbors, node_idx, replace, **kwargs)
49 self.is_sparse_tensor = False
50 row, col = data.edge_index.cpu()
---> 51 self.adj_t = SparseTensor(
52 row=row, col=col, value=torch.arange(col.size(0)),
53 sparse_sizes=(data.num_nodes, data.num_nodes)).t()
~/anaconda3/lib/python3.8/site-packages/torch_sparse/transpose.py in <lambda>(self)
32
33
---> 34 SparseTensor.t = lambda self: t(self)
35
36 ###############################################################################
~/anaconda3/lib/python3.8/site-packages/torch_sparse/transpose.py in t(src)
11
12 if value is not None:
---> 13 value = value[csr2csc]
14
15 sparse_sizes = src.storage.sparse_sizes()
IndexError: index 36028797059905686 is out of bounds for dimension 0 with size 114615892
Thank you. I tried to reproduce this but failed :( I guess this has something to do with a broken torch-sparse installation. Can you try to remove the dependency and try to install from source via pip install --verbose torch-sparse? Sorry for the inconveniences!
Problem solved. It turns out my RAM is broken ... Never thought it could due to a hardware issue. Thank you for your time.