pytorch_sparse icon indicating copy to clipboard operation
pytorch_sparse copied to clipboard

AssertionError in matmul (assert int(col.max()) < N)

Open minsikseo-cdl opened this issue 2 years ago • 6 comments

Hi, I just faced AssertionError while using matmul. Here is the error message:

File "/workspace/networks.py", line 37, in spspmm
    C = matmul(A, B)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/matmul.py", line 139, in matmul
    return spspmm(src, other, reduce)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/matmul.py", line 116, in spspmm
    return spspmm_sum(src, other)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/matmul.py", line 106, in spspmm_sum
    sparse_sizes=(M, K), is_sorted=True)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/tensor.py", line 26, in __init__
    is_sorted=is_sorted)
  File "/opt/conda/lib/python3.7/site-packages/torch_sparse/storage.py", line 76, in __init__
    assert int(col.max()) < N
AssertionError

And my SparseTensors are:

A
> SparseTensor(row=tensor([     0,      0,      0,  ..., 493398, 493398, 493398], device='cuda:0'),
             col=tensor([     0,   4946,   4947,  ..., 493396, 493397, 493398], device='cuda:0'),
             val=tensor([1., 1., 1.,  ..., 1., 1., 1.], device='cuda:0'),
             size=(493399, 493399), nnz=3576315, density=0.00%)
B
> SparseTensor(row=tensor([     0,      0,      0,  ..., 493398, 493398, 493398], device='cuda:0'),
             col=tensor([     0,   4946,   4947,  ..., 493396, 493397, 493398], device='cuda:0'),
             val=tensor([1., 1., 1.,  ..., 1., 1., 1.], device='cuda:0'),
             size=(493399, 493399), nnz=3576315, density=0.00%)

In fact, A and B are identical. So what I want to do is nothing but the sparse matrix power of A

When I check the rows and columns indices and the sparse_size of A, it seems nothing's wrong. Even when I'm doing the identical operation using torch.sparse.mm with torch.sparse_coo_tensor, it gives the right result. (But, somehow, torch.sparse.mm seems to require more memory than torch_sparse.matmul, so I can't do this on GPUs)

It might be the problem that torch.ops.torch_sparse.spspmm_sum at line 101, in torch_sparse.spspmm_sum gives something wrong.

Any comment will be helpful.

Best,

minsikseo-cdl avatar Jan 04 '22 12:01 minsikseo-cdl

This looks related to https://github.com/rusty1s/pytorch_sparse/issues/174.

I sadly cannot reproduce this issue on my machine, so it would be great to have your support finding the cause of this issue. Is it possible for you to debug https://github.com/rusty1s/pytorch_sparse/blob/master/csrc/cuda/spspmm_cuda.cu to see which output produces row or col tensors with unreasonably high values? Let me know if you need any guidance in doing so.

rusty1s avatar Jan 08 '22 12:01 rusty1s

hello, any workaround for this issue? I found this assertion error as well on cuda 10.2

edge_index, edge_weight = spspmm(edge_index, edge_weight, edge_index, edge_weight, num_nodes, num_nodes, num_nodes)

File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/spspmm.py", line 30, in spspmm
    C = matmul(A, B)
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/matmul.py", line 140, in matmul
    return spspmm(src, other, reduce)
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/matmul.py", line 117, in spspmm
    return spspmm_sum(src, other)
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/matmul.py", line 107, in spspmm_sum
    sparse_sizes=(M, K), is_sorted=True)
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/tensor.py", line 38, in __init__
    trust_data=trust_data,
  File "/mnt/iusers01/fse-ugpgt01/compsci01/xxxx/.conda/envs/graph_ae/lib/python3.7/site-packages/torch_sparse/storage.py", line 77, in __init__
    assert trust_data or int(col.max()) < N
AssertionError

Any comment is helpful! Thank you,

dgm2 avatar Apr 18 '22 13:04 dgm2

A current workaround may be to try and see if the newly added sparse matrix multiplication of torch.sparse_csr_tensor directly inside PyTorch works for you, see here. Let me know.

rusty1s avatar Apr 18 '22 13:04 rusty1s

the torch version is giving an error about size. it expects the last index of crow to be 8629? any idea on how get this to work with the torch version?

the torch-sparse version does not give the issue on this setup

image

crow_indices.numel() must be size(0) + 1, but got: 8629

dgm2 avatar Apr 18 '22 14:04 dgm2

I tried converting the SparseTensors

row, col, value = torch.sparse.mm(A.to_torch_sparse_csr_tensor(), B.to_torch_sparse_csr_tensor())

gives

return torch._sparse_mm(mat1, mat2) RuntimeError: torch.empty: Only 2D sparse CSR tensors are supported.

dgm2 avatar Apr 18 '22 14:04 dgm2

How do A and B look like? Aren't they two-dimensional? Which shape do the value tensors of A and B have? This might also be the reason of the error inside torch-sparse since our sparse-matrix multiplication also requires 2-dimensional matrices.

rusty1s avatar Apr 19 '22 12:04 rusty1s

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

github-actions[bot] avatar Oct 17 '22 02:10 github-actions[bot]