GraphScope
GraphScope copied to clipboard
[Proposal] APIs for subgraph with projection or filtering
Proposal
get edges/vertices as dataframe
# return all edges, if g has only one label of edges, or g is a simple graph
# otherwise, raise an error;
# Compatible with NetworkX;
g.edges()
# return all edges with labeled as `L`
g.edges("L")
g.edges(label="L")
# with properties, if an array return edges dataframe with selected fields
g.edges(label="L", properties=["field_1", "field_2"])
# if an dict, return edges dataframe, with the properties satisfying the value equality.
g.edges(label="L", properties={"gender":"male", "area";"CA"})
# or matching specific src/dst id arrays;
g.edges(label="L", src_ids=[], dst_ids=[] )
# Similar to edges
# alias to g.nodes()
g.vertices()
g.nodes()
graph filtering, generate a new (property) graph
# Vertical selection named project, horizontal selection as subgraph
# if edges/vertices are None, induce a graph with given vertices/edges
g2 = g.project(vertices={"L1":[]})
g3 = g.project(vertices={"L1":[], "L2":["P1", "P2"]})
# new, property value quality
g4 = g.subgraph(vertices={"L1":{"P1":"value1", "P2":"value2"}})
# fn is a lambda func, take a vertex as parameter, true/false as return;
g5 = g.subgraph(vertices={"L1":fn})
# a more complex way supporting predicate and range filtering
g6 = g.subgraph(GREMLIN)
graph transformation
# functions like vmap & emap;
# fn: a lambda function,
# input_fields: existing properties
# e.g., maybe we need to compute a pagerank using an existing `degree` field,
# to generate a new graph with new field as `pr`
g7 = g.vertices_apply(fn, [active_subset], labels=[])
# array -> dict, with default values.
g8 = g.vertices_apply(fn, [active_subset], labels=[])
#lambda function for apply_vertex
def vertex_fn(v):
u = Vertex(v)
u["xxx"] = v["yyy"] + 1
return u
# Proposal -2
def vertex_fn_2(v):
builder = v.get_builder()
builder["xxx"] = v["yyy"] +1
u = builder.seal()
return u
# remark
# labels=[], apply on all vertex labels;
# if labels are not None, the input_fields and output_fields are given as an array;
# if labels are not None, the input_fields and output_fields are given as an dict, then the vertices without the given input_field will have a default value, in the new field.
# (deprecated) apply on edges;
# a lambda function fn, will apply on
# all edges with given labels, if src/dst=None, otherwise applied only on the edges matching the given src/dst array;
# lambda funcion in the form: f(src, e, dst) -> (src, e, dst),
# src_ifs, e_ifs, dst_ifs: input_fields, optional, if none, then all the properties are accessiable;
# src_ofs, e_ofs, dst_ofs: cannot be ALL empty, only the given fields are mutable.
# ifs/ofs can be dicts, with default values.
# g9 = g.edges_apply(fn, src=None, dst=None, labels=[], src_ifs=[], e_ifs=[], dst_ifs=[], src_ofs=[], e_ofs=[], dst_ofs=[])
g1 = g.graph_apply(fn)
def fn(frag):
# get a builder of the fragment
builder = frag.get_builder()
# builder has the same methods to the frag listed in
# https://graphscope.io/docs/reference/cython_sdk.html#cython-sdk-api
# e.g., builder.get_nodes_num()
# Initially, the structure & data of builder is same to frag
# Additional APIs:
builder.add_column("prop_name")
builder.add_column("on_label", "prop_name")
builder.set_property(e, "prop_name", "prop_val")
builder.set_property(v, "prop_name", "prop_val")
v = VertexRequest()
v.label = "xxx"
v.property["yyy"] = "zzz"
builder.add_vertex(v)
builder.add_vertices(v_df)
builder.add_edge(e)
new_frag = builder.seal()
return new_frag
Compare to NetworkX
API:
-
g.get_edges
is similar tog.edges
to get edges of g
-
g.edges
return a view of edges; -
g.edges
not support Label; -
g.edges
accept bunch of nodes to only get edges of these nodes; -
g.edges
support looks up operation like g.edges[u, v];
I think we can unify the get_edges
and g.edges
of NetworkX, since label can be ignored in NetworkX Graph, And we can return view in gs graph too with fetch only when use
strategy.
-
g.get_vertices
is similar tog.nodes
to get vertices of g Similar tog.edges
-
g.subgraph
is similar tog.subgraph
of NetworkX to create subgraph of g
- But subgraph of NeworkX is finer-grained, takes a list of nodes as filter condition;
BTW, I suggest we can provide an api like g.neighbors(v)
or g.adj(v)
to get the neighbors of vertex, since this is a natural way to iterate edges in algorithm.
g.neighbor(v, Label=None, data=None)
# return neighbors nodes of v (can select certain edge label neighbors and fetch edge data or not)
g.successors(v, Label=None, data=None)
# return successors nodes of v
g.predecessors(v, Label=None, data=None)
# return predecessors nodes of v
Thanks @acezen , revised the proposal based on your comments.
-
g_apply
- interactive
- create_new
I suggest we can split subgraph
to subgraph
and edge_subgraph
for clarifying the semantic
-
subgraph
for induce vertices subgraph
g4 = g.subgraph(vertices={"L1":{"P1":"value1", "P2":"value2"}})
# fn is a lambda func, take a vertex as parameter, true/false as return;
g5 = g.subgraph(vertices={"L1":fn})
-
edge_subgraph
for induce edges subgraph
g4 = g.edge_subgraph(edges={"L1":{"P1":"value1", "P2":"value2"}})
# fn is a lambda func, take a edges as parameter, true/false as return;
g5 = g.edge_subgraph(edges={"L1":fn})
in the original graph g
, exists edges:
v1--e1-->v2;
v3--e2-->v2;
fn
applied on edges will lead to conflicts on v2
;