dgl
dgl copied to clipboard
[GraphBolt] Cannot re-initialize CUDA in forked subprocess
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
-
python3 examples/sampling/graphbolt/node_classification.py --num-workers 4
File "/opt/conda/envs/dgl-dev-gpu-dgl-0/lib/python3.10/site-packages/torch/utils/data/datapipes/_hook_iterator.py", line 183, in wrap_generator
response = gen.send(None)
File "/home/ubuntu/workspace/dgl_0/python/dgl/graphbolt/base.py", line 209, in __iter__
data = recursive_apply(data, apply_to, self.device)
File "/home/ubuntu/workspace/dgl_0/python/dgl/utils/internal.py", line 1135, in recursive_apply
return fn(data, *args, **kwargs)
File "/home/ubuntu/workspace/dgl_0/python/dgl/graphbolt/base.py", line 145, in apply_to
return x.to(device) if hasattr(x, "to") else x
File "/home/ubuntu/workspace/dgl_0/python/dgl/graphbolt/minibatch.py", line 496, in to
setattr(self, attr, apply_to(getattr(self, attr), device))
File "/home/ubuntu/workspace/dgl_0/python/dgl/graphbolt/minibatch.py", line 462, in apply_to
return recursive_apply(x, lambda x: _to(x, device))
File "/home/ubuntu/workspace/dgl_0/python/dgl/utils/internal.py", line 1135, in recursive_apply
return fn(data, *args, **kwargs)
File "/home/ubuntu/workspace/dgl_0/python/dgl/graphbolt/minibatch.py", line 462, in <lambda>
return recursive_apply(x, lambda x: _to(x, device))
File "/home/ubuntu/workspace/dgl_0/python/dgl/graphbolt/minibatch.py", line 459, in _to
return x.to(device) if hasattr(x, "to") else x
File "/opt/conda/envs/dgl-dev-gpu-dgl-0/lib/python3.10/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
This exception is thrown by __iter__ of CopyTo(datapipe=ShardingFilterIterDataPipe, device=device(type='cuda'), extra_attrs=['seed_nodes'])
Expected behavior
Environment
- DGL Version (e.g., 1.0): master
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):
- OS (e.g., Linux):
- How you installed DGL (
conda
,pip
, source): - Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version (if applicable):
- GPU models and configuration (e.g. V100):
- Any other relevant information:
Additional context
I think this is due to not passing --mode=cpu-cuda
.
Do you think we should automatically set args.mode = cpu-cuda
if the user passes num-workers>0
?
cpu-cuda
works well
pinned-cuda
is mutually exclusive with num_workers > 0
?
Yes, for GPU sampling, num_workers has to be 0. I believe. Is there any use case where passing a value greater than 0 would be useful when using the GPU for sampling?
then please throw an exception if contradiction happens.
Hmm, I think we need to work on the dataloader argument checking overall. We could check where the CopyTo is, whether the feature store is pinned or is on the device, whether the graph is pinned or is on the device etc.
However, how do you think we can check the graph in a general manner? The user might pass sample_neighbors, sample_layer_neighbors or any other custom datapipes. The features could be custom too.
@frozenbugs any insights here?
However, how do you think we can check the graph in a general manner? The user might pass sample_neighbors, sample_layer_neighbors or any other custom datapipes. The features could be custom too.
I am not sure whether I understand this comment, but for the discussion before this one, I think the error msg reported by Rui is not very bad, if we want to clarify more, adding a python try-catch to wrap the call of copy_to and clarify the error msg should be enough.