cugraph
cugraph copied to clipboard
Refactor the Python Sampling Algorithms
The Python sampling algorithms (particularly uniform_neighbor_sample) are overly complicated and need to be redesigned to better suit user needs, and properly support the full suite of features available at the C++/C API/pylibcugraph level. For instance, right now, they take a list of batch ids rather than a batch size parameter, which makes the code significantly more complicated and slower.
Ultimately, the primary means of interaction with the sampling algorithms going forward is probably going to be pylibcugraph and the C++ API, so those should drive the new Python API. Some parameters, like label_to_output_comm_rank should probably be thrown out in favor of preserving the original seed location.
For the future in-memory sampling, we may implement something like label_to_output_comm_rank that is more clever, and evenly divides the samples from N samplers to be distributed across M trainers.