dgl icon indicating copy to clipboard operation
dgl copied to clipboard

Error occurs when sampling with edge_dir='out'. (Argument `rhs_nodes` must contain all the edge destination nodes.)

Open aries-M opened this issue 3 years ago • 4 comments

🐛 Bug

Get "Argument rhs_nodes must contain all the edge destination nodes." when get batch of nodes using sampler+dataloader. When setting edge_dir='in', it is fine.

To Reproduce

import dgl
import numpy as np
import scipy as sp
import pdb

src = [0, 1, 2, 3, 4, 5]
dst = [4, 2, 1, 1, 5, 3]
etype_id = [2, 3, 1, 0, 1, 1]

n_nodes = 6
train_nid = [1, 2, 3]
coo = sp.sparse.coo_matrix((np.ones(len(src)), (src, dst)), shape=[
                           n_nodes, n_nodes])
g = dgl.from_scipy(coo)
nei_sample = dgl.dataloading.MultiLayerNeighborSampler(
    [5, 5, 5], edge_dir='out')

dataloader = dgl.dataloading.DataLoader(
    g, train_nid, nei_sample,
    batch_size=2, shuffle=True, drop_last=False, num_workers=4)

loader_iter = iter(dataloader)
current = next(loader_iter)

Expected behavior

How to fix the issue when the edge_dir='out' is needed.

Environment

  • DGL Version (0.9.0):
  • Backend Library & Version (PyTorch 1.12.1):
  • OS (, Linux):
  • How you installed DGL (pip):
  • Python version:3.9
  • CUDA/cuDNN version (10.2):
  • GPU models and configuration (Tesla V100-PCIE-16GB):

aries-M avatar Sep 06 '22 09:09 aries-M

It seems to be a bug of sampling. to_block here https://github.com/dmlc/dgl/blob/d41d07d0f6cbed17993644b58057e280a9e8f011/python/dgl/dataloading/neighbor_sampler.py#L115 wrongly uses seed_nodes as destination nodes when the edge_dir is 'out'. @BarclayII for awareness.

rudongyu avatar Sep 08 '22 06:09 rudongyu

Hi, @aries-M. We'd like to know the scenario where you use the sampler with 'out' direction. Will the message passing follow the same direction? It would be helpful if some references can be provided. We will re-evaluate whether to keep this option.

rudongyu avatar Sep 20 '22 03:09 rudongyu

Hi @rudongyu. Thanks for your reply. I am trying to sample nodes on a directed graph (where nodes are documents, links are the references between documents). The message passing follows the same direction.

aries-M avatar Sep 20 '22 05:09 aries-M

Hi @aries-M. I think in your case, using dgl.reverse to convert the edges to reversed ones will be a better choice.

In our future plan, we will deprecate this option of edge_dir='out' for sampling.

rudongyu avatar Sep 26 '22 07:09 rudongyu