dgl icon indicating copy to clipboard operation
dgl copied to clipboard

nn.Deepwalk does not work with OGBN-arxiv data, likely a bug?

Open HuangLED opened this issue 1 year ago • 3 comments

🐛 Bug

nn.Deepwalk does not work with OGBN-arxiv data?

To Reproduce

import torch
from dgl.data import CoraGraphDataset
from dgl.nn import DeepWalk
from torch.optim import SparseAdam
from torch.utils.data import DataLoader
from sklearn.linear_model import LogisticRegression

from ogb.nodeproppred import DglNodePropPredDataset
from dgl.dataloading import GraphDataLoader

dataset = DglNodePropPredDataset(name='ogbn-arxiv')

g, label = dataset[0] 
model = DeepWalk(g)

dataloader = DataLoader(torch.arange(g.num_nodes()), batch_size=128,
                        shuffle=False, collate_fn=model.sample)
optimizer = SparseAdam(model.parameters(), lr=0.01)
num_epochs = 5

import time
for epoch in range(num_epochs):
    for batch_idx, batch_walk in enumerate(dataloader):
        tic = time.time()
        print(batch_walk)
        loss = model(batch_walk)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        batch_time = time.time() - tic
        print(
            f"Epoch {epoch:d}, Batch {batch_idx:d},"
            f"Loss: {loss:.5f}, Time {batch_time:.3f}"
        )

I got following error message:

File "/home/user/anaconda3/envs/egs39/lib/python3.9/site-packages/torch/nn/functional.py", line 2237, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

It should run and finish. Also I visually checked the batch data (2-D tensor), all the node id in it seems valid.

Environment

  • DGL Version (e.g., 1.0): 2.0.0
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 2.0
  • OS (e.g., Linux): Linux
  • How you installed DGL (conda, pip, source): pip
  • Build command you used (if compiling from source): just ran the code snippet
  • Python version: 3.9
  • CUDA/cuDNN version (if applicable): CPU only.
  • GPU models and configuration (e.g. V100): N/A
  • Any other relevant information:

Additional context

HuangLED avatar Mar 01 '24 01:03 HuangLED

tensor([[     0,  52893,  14528,  ...,     -1,     -1,     -1],
        [     1, 141692, 100594,  ...,     -1,     -1,     -1],
        [     2, 119218,  16921,  ...,     -1,     -1,     -1],
        ...,
        [   125,  42653,    125,  ...,     -1,     -1,     -1],
        [   126, 116629,    126,  ..., 116629,    126, 116629],
        [   127,  45306,     -1,  ...,     -1,     -1,     -1]])

I ran your code and found the error occured for the sampled batch above. As shown in the matrix, too many random walk traces early stop due to no out edges (indicated by the -1s). -1 is not a valid index to get the node embedding, which results in the error.

rudongyu avatar Mar 06 '24 17:03 rudongyu

Generally, deepwalk should be performed on a bidirected graph of ogbn-arxiv. To solve this problem, you can convert it to a bidirected graph by adding reverse edges.

rudongyu avatar Mar 07 '24 02:03 rudongyu

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Apr 07 '24 01:04 github-actions[bot]