cugraph
cugraph copied to clipboard
[FEA]: Topic: SG Egograph (Batched vs. Single Vertex)
Is this a new feature, an improvement, or a change to existing functionality?
Change
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
Upon closer examination of Egonet.py, it has been discovered that there are a few inconsistencies with the behavior of ego_graph and batched_ego_graph based on what the documentation says.
According to the docs, the ego_graph function is supposed to compute the induced sub-graph of neighbors centered at a single node, n, and return the sub-graph. The batched_ego_graph function (in congruence with its design in the legacy API) is supposed to accept a List of n values, or seeds, and compute each induced sub-graph in "batch" mode (parallel).
However, currently both Python functions just call the same PLC algorithm, which raises two concerns:
- The PLC function doesn't actually run ego_graph in "batch" mode like it's designed to
- The single
ego_graphfunction can be called with multiplenvalues and returns a graph that doesn't specify the cut-offs (offsets).
Describe your ideal solution
The proposed solution for the Python API is to combined both functions into a single ego_graph function, which will recognize when it is given a single n or multiple seeds to compute.
Down the line, another parameter can be added which allows for proper "batch" mode, similar to how cugraph handles batched betweenness centrality.
def ego_graph(
G,
n,
radius=1,
center=True,
undirected=None,
distance=None,
batch_mode=False
):
New Parameters
----------
n : integer or List, cudf.Series, cudf.DataFrame
< update docstring to specify that it can compute a single n or multiple seeds >
batch_mode : Boolean
Can be set to True or False
Returns
----------
A Graph or Edge Lists with Offsets
Tasks:
- [x] current
batched_ego_graphsalgorithm can be given aDeprecationWarning. - [ ] Add true batch-mode support to
ego_graph
Describe any alternatives you have considered
The two functions could also remain the same, but then ego_graph would need to not allow multiple seeds in order to be differentiated from batched_ego_graph, aka, multiple ego_graph calls.
Additional context
This was discovered while looking into this bug on the MG implementation of ego_graph
Further context may be included in the discussion down below.
Code of Conduct
- [X] I agree to follow cuGraph's Code of Conduct
- [X] I have searched the open feature requests and have found no duplicates for this feature request
I'm not sure this analysis is correct - or I'm misinterpreting what your asserting.
Looking here: https://github.com/rapidsai/cugraph/blob/ddfaacf1bd9d305b31e2c55b7ae954ba13f21899/python/pylibcugraph/pylibcugraph/egonet.pyx#L89
It appears to me that the PLC function ego_graph operates on multiple seeds and does return the offsets array that allows the decomposition of the resulting graphs into separate graphs.
I do see that the python code is ignoring the offsets value in the ego_graph call and using it in the deprecated batched_ego_graph call. I see several TODO items in the current python code that are describing some of the issues you see here.
The consequence of ignoring the offsets value in ego_graph is that if you pass multiple seeds to ego_graph, the result will be a single graph that is the composite of all of the resulting graphs, which is incorrect behavior.
So I believe this is entirely a python issue to resolve. Although I'll investigate further if we can identify a PLC call that is generating an answer other than what is expected/required to handle this correctly.