MultiLayerFullNeighborSampler takes too much memory
🐛 Bug
Related discussion: https://discuss.dgl.ai/t/why-does-multilayerfullneighborsampler-consume-a-large-amount-of-memory/2454/12
MultilayerFullNeighborSampler used too much memory than expected. If change MultilayerFullNeighborSampler(num_layers) to MultiLayerNeighborSampler([-1 for _ in range(num_layers)]). The memory consumption goes much lower.
Potentially this might due to the implementation using in_subgraph but not sample_neighbors
Confirmed on the master branch
To reproduce
using examples/pytorch/graphsage/train_sampling.py
Change sampler to sampler = dgl.dataloading.MultiLayerFullNeighborSampler(4) and sampler = dgl.dataloading.MultiLayerNeighborSampler([-1 for _ in range(num_layers)]).
cc @BarclayII
Tried the following and I observed the same memory consumption pattern:
import dgl
import torch
import resource
src = torch.randint(0, 1000000, (60000000,))
dst = torch.randint(0, 1000000, (60000000,))
ss = torch.cat([src, dst])
dd = torch.cat([dst, src])
g = dgl.graph((ss, dd), num_nodes=1000000)
sampler = dgl.dataloading.MultiLayerNeighborSampler([-1, -1, -1])
#sampler = dgl.dataloading.MultiLayerFullNeighborSampler(3)
dl = dgl.dataloading.NodeDataLoader(g, torch.arange(1000000), sampler, batch_size=1000, num_workers=8)
for _ in dl:
pass
The memory consumption goes from 23GB to 89GB and then back to 23GB. So I think there's some intermediate operation that consumes a lot of memory for both samplers.
Is this issue present when num_workers=0?
What or which part causes such a large amount memory?
Need to confirm the bug again for the latest release (0.9).
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
The memory consumption periodically goes up from 23GB to 60GB and then goes down.