graph-learn
graph-learn copied to clipboard
how to deal with boundary problem
As we know from the paper, graph-learn have implemented four built-in graph partition algorithms to minimize the number of crossing edges whose endpoints are in different workers, but there still may be some crossing edges exsist after graph partition.
for the Graph as follow(3-hop):
- hop-1: get vertex 1 neighbor(vertex 3) by adj_matrix
- hop-2: sampling for vertex 3 need to send request from server 1 to server 2 and get vertex 3 neighbor(vertex 2) by adj_matrix
- hop-3: sampling for vertex 2 need to send request from server 2 to server 1
I want to know graph-learn how to deal with this bounding problem or provide optimization methods to avoid this situation?
From the code base, only hashed partitioning is supported
Graph partition is a classical problem which was proposed many years ago. But till now, universal solutions are not found. In my point, we can develop kinds of task specific partition methods, and then push them to GL. These methods need to balance system efficiency and algorithm accuracy, and not only pursue the absolute correctness on system.
In our original paper, we propose some caching strategy to speed up sampling. It works well when graph data is huge. Due to the system complexity, it is not open-sourced in this version. Instead, we abstract Partitioner
module, to enable solutions from contributors.
If you find some interesting solutions, welcome to contribution!
Um, this question is not actually answered i think ==!