dgl icon indicating copy to clipboard operation
dgl copied to clipboard

MultiLayerFullNeighborSampler takes too much memory

Open VoVAllen opened this issue 4 years ago • 7 comments

🐛 Bug

Related discussion: https://discuss.dgl.ai/t/why-does-multilayerfullneighborsampler-consume-a-large-amount-of-memory/2454/12

MultilayerFullNeighborSampler used too much memory than expected. If change MultilayerFullNeighborSampler(num_layers) to MultiLayerNeighborSampler([-1 for _ in range(num_layers)]). The memory consumption goes much lower.

Potentially this might due to the implementation using in_subgraph but not sample_neighbors

Confirmed on the master branch

To reproduce

using examples/pytorch/graphsage/train_sampling.py

Change sampler to sampler = dgl.dataloading.MultiLayerFullNeighborSampler(4) and sampler = dgl.dataloading.MultiLayerNeighborSampler([-1 for _ in range(num_layers)]).

cc @BarclayII

VoVAllen avatar Nov 04 '21 08:11 VoVAllen

Tried the following and I observed the same memory consumption pattern:

import dgl
import torch
import resource

src = torch.randint(0, 1000000, (60000000,))
dst = torch.randint(0, 1000000, (60000000,))
ss = torch.cat([src, dst])
dd = torch.cat([dst, src])
g = dgl.graph((ss, dd), num_nodes=1000000)

sampler = dgl.dataloading.MultiLayerNeighborSampler([-1, -1, -1])
#sampler = dgl.dataloading.MultiLayerFullNeighborSampler(3)
dl = dgl.dataloading.NodeDataLoader(g, torch.arange(1000000), sampler, batch_size=1000, num_workers=8)
for _ in dl:
    pass

The memory consumption goes from 23GB to 89GB and then back to 23GB. So I think there's some intermediate operation that consumes a lot of memory for both samplers.

BarclayII avatar Nov 12 '21 05:11 BarclayII

Is this issue present when num_workers=0?

nv-dlasalle avatar Dec 03 '21 19:12 nv-dlasalle

What or which part causes such a large amount memory?

allencho1222 avatar Jan 03 '22 15:01 allencho1222

Need to confirm the bug again for the latest release (0.9).

jermainewang avatar Jul 25 '22 10:07 jermainewang

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Aug 25 '22 01:08 github-actions[bot]

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar Sep 25 '22 01:09 github-actions[bot]

The memory consumption periodically goes up from 23GB to 60GB and then goes down.

BarclayII avatar Sep 26 '22 07:09 BarclayII