dgl
dgl copied to clipboard
nn.Deepwalk does not work with OGBN-arxiv data, likely a bug?
🐛 Bug
nn.Deepwalk does not work with OGBN-arxiv data?
To Reproduce
import torch
from dgl.data import CoraGraphDataset
from dgl.nn import DeepWalk
from torch.optim import SparseAdam
from torch.utils.data import DataLoader
from sklearn.linear_model import LogisticRegression
from ogb.nodeproppred import DglNodePropPredDataset
from dgl.dataloading import GraphDataLoader
dataset = DglNodePropPredDataset(name='ogbn-arxiv')
g, label = dataset[0]
model = DeepWalk(g)
dataloader = DataLoader(torch.arange(g.num_nodes()), batch_size=128,
shuffle=False, collate_fn=model.sample)
optimizer = SparseAdam(model.parameters(), lr=0.01)
num_epochs = 5
import time
for epoch in range(num_epochs):
for batch_idx, batch_walk in enumerate(dataloader):
tic = time.time()
print(batch_walk)
loss = model(batch_walk)
optimizer.zero_grad()
loss.backward()
optimizer.step()
batch_time = time.time() - tic
print(
f"Epoch {epoch:d}, Batch {batch_idx:d},"
f"Loss: {loss:.5f}, Time {batch_time:.3f}"
)
I got following error message:
File "/home/user/anaconda3/envs/egs39/lib/python3.9/site-packages/torch/nn/functional.py", line 2237, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self
It should run and finish. Also I visually checked the batch data (2-D tensor), all the node id in it seems valid.
Environment
- DGL Version (e.g., 1.0): 2.0.0
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 2.0
- OS (e.g., Linux): Linux
- How you installed DGL (
conda
,pip
, source): pip - Build command you used (if compiling from source): just ran the code snippet
- Python version: 3.9
- CUDA/cuDNN version (if applicable): CPU only.
- GPU models and configuration (e.g. V100): N/A
- Any other relevant information:
Additional context
tensor([[ 0, 52893, 14528, ..., -1, -1, -1],
[ 1, 141692, 100594, ..., -1, -1, -1],
[ 2, 119218, 16921, ..., -1, -1, -1],
...,
[ 125, 42653, 125, ..., -1, -1, -1],
[ 126, 116629, 126, ..., 116629, 126, 116629],
[ 127, 45306, -1, ..., -1, -1, -1]])
I ran your code and found the error occured for the sampled batch above. As shown in the matrix, too many random walk traces early stop due to no out edges (indicated by the -1s). -1 is not a valid index to get the node embedding, which results in the error.
Generally, deepwalk should be performed on a bidirected graph of ogbn-arxiv. To solve this problem, you can convert it to a bidirected graph by adding reverse edges.
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you