cugraph icon indicating copy to clipboard operation
cugraph copied to clipboard

[FEA]: Topic: SG Egograph (Batched vs. Single Vertex)

Open nv-rliu opened this issue 1 year ago • 1 comments

Is this a new feature, an improvement, or a change to existing functionality?

Change

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

Upon closer examination of Egonet.py, it has been discovered that there are a few inconsistencies with the behavior of ego_graph and batched_ego_graph based on what the documentation says.

According to the docs, the ego_graph function is supposed to compute the induced sub-graph of neighbors centered at a single node, n, and return the sub-graph. The batched_ego_graph function (in congruence with its design in the legacy API) is supposed to accept a List of n values, or seeds, and compute each induced sub-graph in "batch" mode (parallel).

However, currently both Python functions just call the same PLC algorithm, which raises two concerns:

  1. The PLC function doesn't actually run ego_graph in "batch" mode like it's designed to
  2. The single ego_graph function can be called with multiple n values and returns a graph that doesn't specify the cut-offs (offsets).

Describe your ideal solution

The proposed solution for the Python API is to combined both functions into a single ego_graph function, which will recognize when it is given a single n or multiple seeds to compute.

Down the line, another parameter can be added which allows for proper "batch" mode, similar to how cugraph handles batched betweenness centrality.

def ego_graph(
    G,
    n,
    radius=1,
    center=True,
    undirected=None,
    distance=None,
    batch_mode=False
):

New Parameters
    ----------
    n : integer or List, cudf.Series, cudf.DataFrame
    < update docstring to specify that it can compute a single n or multiple seeds >

    batch_mode : Boolean
    Can be set to True or False

Returns
    ----------
A Graph or Edge Lists with Offsets

Tasks:

  • [x] current batched_ego_graphs algorithm can be given a DeprecationWarning.
  • [ ] Add true batch-mode support to ego_graph

Describe any alternatives you have considered

The two functions could also remain the same, but then ego_graph would need to not allow multiple seeds in order to be differentiated from batched_ego_graph, aka, multiple ego_graph calls.

Additional context

This was discovered while looking into this bug on the MG implementation of ego_graph

Further context may be included in the discussion down below.

Code of Conduct

  • [X] I agree to follow cuGraph's Code of Conduct
  • [X] I have searched the open feature requests and have found no duplicates for this feature request

nv-rliu avatar Feb 23 '24 22:02 nv-rliu

I'm not sure this analysis is correct - or I'm misinterpreting what your asserting.

Looking here: https://github.com/rapidsai/cugraph/blob/ddfaacf1bd9d305b31e2c55b7ae954ba13f21899/python/pylibcugraph/pylibcugraph/egonet.pyx#L89

It appears to me that the PLC function ego_graph operates on multiple seeds and does return the offsets array that allows the decomposition of the resulting graphs into separate graphs.

I do see that the python code is ignoring the offsets value in the ego_graph call and using it in the deprecated batched_ego_graph call. I see several TODO items in the current python code that are describing some of the issues you see here.

The consequence of ignoring the offsets value in ego_graph is that if you pass multiple seeds to ego_graph, the result will be a single graph that is the composite of all of the resulting graphs, which is incorrect behavior.

So I believe this is entirely a python issue to resolve. Although I'll investigate further if we can identify a PLC call that is generating an answer other than what is expected/required to handle this correctly.

ChuckHastings avatar May 22 '24 23:05 ChuckHastings