pygod icon indicating copy to clipboard operation
pygod copied to clipboard

Mini-batch loading doesn't prevent memory allocation errors in DOMINANT

Open joshred83 opened this issue 7 months ago • 1 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

from pygod.detector import DOMINANT
from torch_geometric.datasets import EllipticBitcoinDataset
dataset = EllipticBitcoinDataset(root="data/elliptic")
data = dataset[0]
model = DOMINANT(batch_size=1024, epochs=2)
model.fit(data)

Expected behavior Digging through the code implies that the model should be operating on minibatches produced by DeepDetector's

NeighborLoader routine, but the adjacency matrix is calculated using the full graph resulting in memory errors:

  File "/home/red/dl-graph/simplest.py", line 12, in <module>
    model.fit(data)
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/pygod/detector/base.py", line 431, in fit
    self.process_graph(data)
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/pygod/detector/dominant.py", line 139, in process_graph
    DOMINANTBase.process_graph(data)
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/pygod/nn/dominant.py", line 132, in process_graph
    data.s = to_dense_adj(data.edge_index)[0]
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/torch_geometric/utils/_to_dense_adj.py", line 97, in to_dense_adj
    adj = scatter(edge_attr, idx, dim=0, dim_size=flattened_size, reduce='sum')
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/torch_geometric/utils/_scatter.py", line 75, in scatter
    return src.new_zeros(size).scatter_add_(dim, index, src)
RuntimeError: [enforce fail at alloc_cpu.cpp:118] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 166087221444 bytes. Error code 12 (Cannot allocate memory)```

I'm not sure whether this is a limitation of the algorithm, or a bug. If it's a limitation of the algorithm, the documentation and error messages don't really explain what to expect or how to avoid it. 

Here's some system information:
## System Information

- **PyTorch version:**  
  `2.6.0+cu124`

- **PyTorch Geometric version:**  
  `2.6.1`

- **PyGOD version:**  
  `1.1.0`

- **Python version:**  
  `Python 3.10.17`

- **OS:**  
  `Linux The-Tarrasque 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux`

- **CUDA version:**  

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0 12.4

joshred83 avatar Apr 19 '25 18:04 joshred83

We are able to use DOMINANTBase with NeighborLoader as a workaround, but aren't sure whether this is good practice.

joshred83 avatar Apr 19 '25 18:04 joshred83