dgl icon indicating copy to clipboard operation
dgl copied to clipboard

[Bug] Using LIBXSMM gives a wrong SpMM output for cetain cases

Open davidmin7 opened this issue 3 years ago • 1 comments

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Build the latest (or anything after 0.7) using cmake -DUSE_CUDA=ON -DBUILD_TORCH=ON -DUSE_LIBXSMM=ON .. and run the following code.
import torch
import dgl
import dgl.function as fn
from ogb.nodeproppred import DglNodePropPredDataset

data = DglNodePropPredDataset(name='ogbn-papers100M', root='./')
g, labels = data[0]
g = dgl.to_bidirected(g)
g.ndata['temp'] = torch.ones(g.num_nodes())
g.update_all(fn.copy_u('temp', 'm'), fn.sum('m', 'temp'))
  1. The resulting outputs of g.ndata['temp'] are all zeros.
>>> g.ndata['temp']
tensor([0., 0., 0.,  ..., 0., 0., 0.])

Expected behavior

  1. Build the latest (or anything after 0.7) using cmake -DUSE_CUDA=ON -DBUILD_TORCH=ON -DUSE_LIBXSMM=OFF .. and run the following code.
import torch
import dgl
import dgl.function as fn
from ogb.nodeproppred import DglNodePropPredDataset

data = DglNodePropPredDataset(name='ogbn-papers100M', root='./')
g, labels = data[0]
g = dgl.to_bidirected(g)
g.ndata['temp'] = torch.ones(g.num_nodes())
g.update_all(fn.copy_u('temp', 'm'), fn.sum('m', 'temp'))
  1. The resulting outputs of g.ndata['temp'] are now correct.
>>> g.ndata['temp']
tensor([ 1.,  8.,  7.,  ..., 15.,  1.,  2.])

Environment

  • DGL Version (e.g., 1.0): master, cac25f63
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 1.9
  • OS (e.g., Linux): CentOS 8
  • How you installed DGL (conda, pip, source): source
  • Build command you used (if compiling from source): Explained above
  • Python version: 3.8
  • CUDA/cuDNN version (if applicable): 10.2
  • GPU models and configuration (e.g. V100): V100
  • Any other relevant information: Intel Xeon Gold 6230

Additional context

Not entirely sure, but seems like the problem is occurring when the number edges is very large.

davidmin7 avatar Nov 01 '21 23:11 davidmin7

Let me look into this one

sanchit-misra avatar Nov 08 '21 07:11 sanchit-misra

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Aug 28 '22 01:08 github-actions[bot]