GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

[Proposal] APIs for subgraph with projection or filtering

Open yecol opened this issue 2 years ago • 5 comments

Proposal

get edges/vertices as dataframe

# return all edges, if g has only one label of edges, or g is a simple graph
# otherwise, raise an error;
# Compatible with NetworkX;
g.edges()

# return all edges with labeled as `L`
g.edges("L")
g.edges(label="L")

# with properties, if an array return edges dataframe with selected fields
g.edges(label="L", properties=["field_1", "field_2"])
# if an dict, return edges dataframe, with the properties satisfying the value equality.
g.edges(label="L", properties={"gender":"male", "area";"CA"})
# or matching specific src/dst id arrays;
g.edges(label="L", src_ids=[], dst_ids=[] )

# Similar to edges
# alias to g.nodes()
g.vertices()
g.nodes()

graph filtering, generate a new (property) graph

# Vertical selection named project, horizontal selection as subgraph

# if edges/vertices are None, induce a graph with given vertices/edges
g2 = g.project(vertices={"L1":[]})
g3 = g.project(vertices={"L1":[], "L2":["P1", "P2"]})
# new, property value quality

g4 = g.subgraph(vertices={"L1":{"P1":"value1", "P2":"value2"}})

# fn is a lambda func, take a vertex as parameter, true/false as return;
g5 = g.subgraph(vertices={"L1":fn})
# a more complex way supporting predicate and range filtering
g6 = g.subgraph(GREMLIN)

graph transformation

# functions like vmap & emap; 
# fn: a lambda function, 
# input_fields: existing properties
# e.g., maybe we need to compute a pagerank using an existing `degree` field, 
# to generate a new graph with new field as `pr`
g7 =  g.vertices_apply(fn, [active_subset], labels=[])
# array -> dict, with default values.
g8 =  g.vertices_apply(fn, [active_subset], labels=[])

#lambda function for apply_vertex
def vertex_fn(v):
    u = Vertex(v)
    u["xxx"] = v["yyy"] + 1
    return u

# Proposal -2
def vertex_fn_2(v):
   builder = v.get_builder()
   builder["xxx"] =  v["yyy"] +1
   u = builder.seal()
   return u


# remark
# labels=[], apply on all vertex labels;
# if labels are not None, the input_fields and output_fields are given as an array;
# if labels are not None, the input_fields and output_fields are given as an dict, then the vertices without the given input_field will have a default value, in the new field.  


# (deprecated) apply on edges;
# a lambda function fn, will apply on 
# all edges with given labels, if src/dst=None, otherwise applied only on the edges matching the given src/dst array;
# lambda funcion in the form: f(src, e, dst) -> (src, e, dst),
# src_ifs, e_ifs, dst_ifs: input_fields, optional, if none, then all the properties are accessiable;
# src_ofs, e_ofs, dst_ofs: cannot be ALL empty, only the given fields are mutable.
# ifs/ofs can be dicts, with default values. 
# g9 =  g.edges_apply(fn, src=None, dst=None, labels=[], src_ifs=[], e_ifs=[], dst_ifs=[], src_ofs=[], e_ofs=[], dst_ofs=[]) 


g1 = g.graph_apply(fn)

def fn(frag):

    # get a builder of the fragment
    builder = frag.get_builder()
    # builder has the same methods to the frag listed in 
    # https://graphscope.io/docs/reference/cython_sdk.html#cython-sdk-api
    # e.g., builder.get_nodes_num()
    # Initially, the structure & data of builder is same to frag

    # Additional APIs:
    builder.add_column("prop_name")
    builder.add_column("on_label", "prop_name")
    builder.set_property(e, "prop_name", "prop_val")
    builder.set_property(v, "prop_name", "prop_val")

    v  = VertexRequest()
    v.label = "xxx"
    v.property["yyy"] = "zzz"
    builder.add_vertex(v)
    builder.add_vertices(v_df)
    builder.add_edge(e)

    new_frag = builder.seal()
    return new_frag

yecol avatar May 10 '22 06:05 yecol

Compare to NetworkX API:

  1. g.get_edges is similar to g.edges to get edges of g
  • g.edges return a view of edges;
  • g.edges not support Label;
  • g.edges accept bunch of nodes to only get edges of these nodes;
  • g.edges support looks up operation like g.edges[u, v];

I think we can unify the get_edges and g.edges of NetworkX, since label can be ignored in NetworkX Graph, And we can return view in gs graph too with fetch only when use strategy.

  1. g.get_vertices is similar to g.nodes to get vertices of g Similar to g.edges

  2. g.subgraph is similar to g.subgraph of NetworkX to create subgraph of g

  • But subgraph of NeworkX is finer-grained, takes a list of nodes as filter condition;

BTW, I suggest we can provide an api like g.neighbors(v) or g.adj(v) to get the neighbors of vertex, since this is a natural way to iterate edges in algorithm.

g.neighbor(v, Label=None, data=None)
# return neighbors  nodes of v (can select certain edge label neighbors and fetch edge data or not)
g.successors(v, Label=None, data=None)
# return successors  nodes of v
g.predecessors(v, Label=None, data=None)
# return predecessors  nodes of v

acezen avatar May 17 '22 08:05 acezen

Thanks @acezen , revised the proposal based on your comments.

yecol avatar May 20 '22 03:05 yecol

  • g_apply
  • interactive
  • create_new

yecol avatar May 20 '22 07:05 yecol

I suggest we can split subgraph to subgraph and edge_subgraph for clarifying the semantic

  1. subgraph for induce vertices subgraph
g4 = g.subgraph(vertices={"L1":{"P1":"value1", "P2":"value2"}})

# fn is a lambda func, take a vertex as parameter, true/false as return;
g5 = g.subgraph(vertices={"L1":fn})
  1. edge_subgraph for induce edges subgraph
g4 = g.edge_subgraph(edges={"L1":{"P1":"value1", "P2":"value2"}})

# fn is a lambda func, take a edges as parameter, true/false as return;
g5 = g.edge_subgraph(edges={"L1":fn})

acezen avatar May 20 '22 08:05 acezen

in the original graph g, exists edges:

v1--e1-->v2;
v3--e2-->v2;

fn applied on edges will lead to conflicts on v2;

yecol avatar May 24 '22 02:05 yecol