dgl MultiLayerFullNeighborSampler takes too much memory

🐛 Bug

Related discussion: https://discuss.dgl.ai/t/why-does-multilayerfullneighborsampler-consume-a-large-amount-of-memory/2454/12

MultilayerFullNeighborSampler used too much memory than expected. If change MultilayerFullNeighborSampler(num_layers) to MultiLayerNeighborSampler([-1 for _ in range(num_layers)]). The memory consumption goes much lower.

Potentially this might due to the implementation using in_subgraph but not sample_neighbors

Confirmed on the master branch

To reproduce

using examples/pytorch/graphsage/train_sampling.py

Change sampler to sampler = dgl.dataloading.MultiLayerFullNeighborSampler(4) and sampler = dgl.dataloading.MultiLayerNeighborSampler([-1 for _ in range(num_layers)]).

cc @BarclayII

Nov 04 '21 08:11 VoVAllen

Tried the following and I observed the same memory consumption pattern:

import dgl
import torch
import resource

src = torch.randint(0, 1000000, (60000000,))
dst = torch.randint(0, 1000000, (60000000,))
ss = torch.cat([src, dst])
dd = torch.cat([dst, src])
g = dgl.graph((ss, dd), num_nodes=1000000)

sampler = dgl.dataloading.MultiLayerNeighborSampler([-1, -1, -1])
#sampler = dgl.dataloading.MultiLayerFullNeighborSampler(3)
dl = dgl.dataloading.NodeDataLoader(g, torch.arange(1000000), sampler, batch_size=1000, num_workers=8)
for _ in dl:
    pass

The memory consumption goes from 23GB to 89GB and then back to 23GB. So I think there's some intermediate operation that consumes a lot of memory for both samplers.