Error occurs when sampling with edge_dir='out'. (Argument `rhs_nodes` must contain all the edge destination nodes.)
🐛 Bug
Get "Argument rhs_nodes must contain all the edge destination nodes." when get batch of nodes using sampler+dataloader.
When setting edge_dir='in', it is fine.
To Reproduce
import dgl
import numpy as np
import scipy as sp
import pdb
src = [0, 1, 2, 3, 4, 5]
dst = [4, 2, 1, 1, 5, 3]
etype_id = [2, 3, 1, 0, 1, 1]
n_nodes = 6
train_nid = [1, 2, 3]
coo = sp.sparse.coo_matrix((np.ones(len(src)), (src, dst)), shape=[
n_nodes, n_nodes])
g = dgl.from_scipy(coo)
nei_sample = dgl.dataloading.MultiLayerNeighborSampler(
[5, 5, 5], edge_dir='out')
dataloader = dgl.dataloading.DataLoader(
g, train_nid, nei_sample,
batch_size=2, shuffle=True, drop_last=False, num_workers=4)
loader_iter = iter(dataloader)
current = next(loader_iter)
Expected behavior
How to fix the issue when the edge_dir='out' is needed.
Environment
- DGL Version (0.9.0):
- Backend Library & Version (PyTorch 1.12.1):
- OS (, Linux):
- How you installed DGL (pip):
- Python version:3.9
- CUDA/cuDNN version (10.2):
- GPU models and configuration (Tesla V100-PCIE-16GB):
It seems to be a bug of sampling. to_block here https://github.com/dmlc/dgl/blob/d41d07d0f6cbed17993644b58057e280a9e8f011/python/dgl/dataloading/neighbor_sampler.py#L115 wrongly uses seed_nodes as destination nodes when the edge_dir is 'out'. @BarclayII for awareness.
Hi, @aries-M. We'd like to know the scenario where you use the sampler with 'out' direction. Will the message passing follow the same direction? It would be helpful if some references can be provided. We will re-evaluate whether to keep this option.
Hi @rudongyu. Thanks for your reply. I am trying to sample nodes on a directed graph (where nodes are documents, links are the references between documents). The message passing follows the same direction.
Hi @aries-M. I think in your case, using dgl.reverse to convert the edges to reversed ones will be a better choice.
In our future plan, we will deprecate this option of edge_dir='out' for sampling.