dgl icon indicating copy to clipboard operation
dgl copied to clipboard

Using a Custom Graph Storage object with Pytorch TensorStorage on the CPU breaks the dataloader when pin_memory is not specified.

Open nv-dlasalle opened this issue 3 years ago • 0 comments

🐛 Bug

In the dataloader, we do nothing with pin_prefetcher if the graph is not a DGLHeteroGraph object, and thus it defaults to None: https://github.com/dmlc/dgl/blob/master/python/dgl/dataloading/dataloader.py#L776 Here pin_prefetcher gets passed to the feature storage as pin_memory: https://github.com/dmlc/dgl/blob/master/python/dgl/dataloading/dataloader.py#L258 And finally here it gets passed to torch.empty() which requires a bool, not NoneType: https://github.com/dmlc/dgl/blob/master/python/dgl/storages/pytorch_tensor.py#L11

The final error is:

Traceback (most recent call last):
  File "node_classification.py", line 178, in <module>
    train(args, device, g, dataset, model)
  File "node_classification.py", line 140, in train
    for it, (input_nodes, output_nodes, blocks) in enumerate(train_dataloader):
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/dataloading/dataloader.py", line 498, in __next__
    self._next_non_threaded() if not self.use_thread else self._next_threaded()
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/dataloading/dataloader.py", line 480, in _next_non_threaded
    batch, feats, stream_event = _prefetch(batch, self.dataloader, self.stream)
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/dataloading/dataloader.py", line 321, in _prefetch
    feats = recursive_apply(batch, _prefetch_for, dataloader)
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/utils/internal.py", line 996, in recursive_apply
    return [recursive_apply(v, fn, *args, **kwargs) for v in data]
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/utils/internal.py", line 996, in <listcomp>
    return [recursive_apply(v, fn, *args, **kwargs) for v in data]
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/utils/internal.py", line 996, in recursive_apply
    return [recursive_apply(v, fn, *args, **kwargs) for v in data]
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/utils/internal.py", line 996, in <listcomp>
    return [recursive_apply(v, fn, *args, **kwargs) for v in data]
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/utils/internal.py", line 998, in recursive_apply
    return fn(data, *args, **kwargs)
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/dataloading/dataloader.py", line 290, in _prefetch_for
    return _prefetch_for_subgraph(item, dataloader)
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/dataloading/dataloader.py", line 281, in _prefetch_for_subgraph
    NID, dataloader.device, dataloader.pin_prefetcher)
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/dataloading/dataloader.py", line 265, in _prefetch_update_feats
    column.id_ or default_id, device, pin_prefetcher)
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/storages/pytorch_tensor.py", line 40, in fetch
    pin_memory, **kwargs)
  File "/home/dominique/.local/lib/python3.6/site-packages/dgl-0.9-py3.6-linux-x86_64.egg/dgl/storages/pytorch_tensor.py", line 11, in _fetch_cpu
    pin_memory=pin_memory)
TypeError: empty() received an invalid combination of arguments - got (int, int, pin_memory=NoneType, dtype=torch.dtype), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

nv-dlasalle avatar Jul 27 '22 20:07 nv-dlasalle