dgl
dgl copied to clipboard
Get stuck when trying to generation random bipartite with 2B edges
🐛 Bug
I tried with below code to generate random bipartite. It succeeded with 1B edges( takes about 800 seconds) but get stuck with 2B edges(more than a few hours). No error/exception is thrown. I tried to dig in and find that below insert
always failed since selected.size() > 1073500*1000
. 1073500*1000
is a rough number, not deterministic, but always failed around it. Seems RandInt()
always returns duplicate value.
https://github.com/dmlc/dgl/blob/6e1be69a84ba3e17e8e4db3c3768448f3620ecf4/src/random/cpu/choice.cc#L95-L99
To Reproduce
Steps to reproduce the behavior:
machine: x2idn.16xlarge
, 1T CPU RAM, 64 CPUs.
num_nodes = 5 * 1000 * 1000
num_edges = 2 * 1000 * 1000 * 1000
num_src_nodes = num_nodes//2
num_dst_nodes = num_nodes - num_src_nodes
tic = time.time()
t_tic = tic
g = dgl.rand_bipartite('node1', 'edge', 'node2',
num_src_nodes, num_dst_nodes, num_edges)
Expected behavior
generation should finish in linear increase of 1B edges cases, or error/exception should be thrown.
Environment
- DGL Version (e.g., 1.0): master, 0.9.x
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):
- OS (e.g., Linux): Linux
- How you installed DGL (
conda
,pip
, source): - Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version (if applicable):
- GPU models and configuration (e.g. V100):
- Any other relevant information:
Additional context
Could you figure out what the population and the number of samples are? Since when sampling without replacement we are doing rejection sampling, if the number of samples is very big then this indeed will take a long time.
population
: 2500K*2500K = 6250B
sample
: 2B
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you