cugraph icon indicating copy to clipboard operation
cugraph copied to clipboard

[BUG] is_edge_ids in uniform_sampling does not seem to do anything

Open VibhuJawa opened this issue 3 years ago • 12 comments

Describe the bug

When we pass is_edge_ids=True we dont seem to do anything with it in the uniform_neighbor_sample function (see below). We should triage this and fix it accordingly.

https://github.com/rapidsai/cugraph/blob/7d8f0fd63ad58ce6deada5508bfc08ee9aa46d36/python/cugraph/cugraph/sampling/uniform_neighbor_sample.py#L23-L50

VibhuJawa avatar Aug 10 '22 22:08 VibhuJawa

MRE: EDIT: Cleaned up MRE to be easier to understand.


import cudf
import cugraph
import numpy as np

df = cudf.DataFrame({'source':[1,1],
                     'destination':[2,2],
                     'eids':[0,1]
                    })
df = df.astype(np.int32)


G = cugraph.Graph(directed=True)
G.from_cudf_edgelist(df,edge_attr='eids')

sample_df = cugraph.uniform_neighbor_sample(
    G,
    cudf.Series([1]),
    fanout_vals=[2],
    is_edge_ids=True
)
print(sample_df)
   indices  sources  destinations
0        0        1             2

Expected behavior:

   indices  sources  destinations
0        0        1             2
1        1        1             2

VibhuJawa avatar Aug 15 '22 22:08 VibhuJawa

The C++ code operates only on vertices but I can always automatically get the start vertices from the provided edges. Also shouldn't the expected result be:

   indices  sources  destinations
0        1        1             2

alexbarghi-nv avatar Aug 18 '22 03:08 alexbarghi-nv

While we're addressing this, we should also update the docstring.

rlratzel avatar Aug 18 '22 14:08 rlratzel

import cudf
import cugraph
import numpy as np

df = cudf.DataFrame({'source':[1,1],
                     'destination':[2,2],
                     'eids':[0,1]
                    })
df = df.astype(np.int32)


G = cugraph.Graph(directed=True)
G.from_cudf_edgelist(df,edge_attr='eids')

sample_df = cugraph.uniform_neighbor_sample(
    G,
    cudf.Series([1]),
    fanout_vals=[2],
    is_edge_ids=True
)
print(sample_df)

Nope, i want both the edges. Index 0 and Index 1. Maybe below example is easier to get my point across ?

The index_ids here are 10 and 11 and I do not renumber to be clearer.

import cudf
import cugraph
import numpy as np

df = cudf.DataFrame({'source':[1,1],
                     'destination':[2,2],
                     'eids':[10,11]
                    })
df = df.astype(np.int32)


G = cugraph.Graph(directed=True)
G.from_cudf_edgelist(df,edge_attr='eids', renumber=False, legacy_renum_only=False)

sample_df = cugraph.uniform_neighbor_sample(
    G,
    cudf.Series([1]).astype(np.int32),
    fanout_vals=[2],
    is_edge_ids=True
)
print(sample_df)
   sources  destinations  indices
0        1             2       10
0        1             2       11

VibhuJawa avatar Aug 18 '22 15:08 VibhuJawa

Ok, so you want the current behavior, but with the is_edge_ids flag specifying whether the weights are edge ids rather than always assuming they are, correct?

alexbarghi-nv avatar Aug 18 '22 15:08 alexbarghi-nv

My interpretation was that if is_edge_ids was true, it should treat the start ids as edge ids and automatically select the vertices from those edges.

alexbarghi-nv avatar Aug 18 '22 15:08 alexbarghi-nv

Ok, so you want the current behavior, but with the is_edge_ids flag specifying whether the weights are edge ids rather than always assuming they are, correct?

Yup. Thats my understanding and then treating sources, destinations,edge_ids as unique identifiers so that they are sampled.

VibhuJawa avatar Aug 18 '22 16:08 VibhuJawa

My interpretation was that if is_edge_ids was true, it should treat the start ids as edge ids and automatically select the vertices from those edges.

Okay. Yup, thats a perfectly viable understanding too . I guess we really need a docstring here to clarify what this was supposed to do.

VibhuJawa avatar Aug 18 '22 16:08 VibhuJawa

ok, so there are 2 problems here:

  1. You need to use cugraph.MultiGraph since cugraph.Graph will automatically drop duplicate edges
  2. Apparently it ignores the duplicate edge even when using a cugraph.MultiGraph which is either a bug in plc graph creation or uniform_neighbor_sample itself.

alexbarghi-nv avatar Aug 18 '22 16:08 alexbarghi-nv

(2) is linked to the count_and_remove_duplicates call currently in the C++ code for uniform_neighbor_sample. And I would say (1) is expected behavior that we won't fix.

alexbarghi-nv avatar Aug 18 '22 16:08 alexbarghi-nv

(2) is linked to the count_and_remove_duplicates call currently in the C++ code for uniform_neighbor_sample. And I would say (1) is expected behavior that we won't fix.

Thanks for helping with the triage @alexbarghi-nv , I think we are now all in agreement.

VibhuJawa avatar Aug 18 '22 18:08 VibhuJawa

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Sep 17 '22 19:09 github-actions[bot]