incubator-graphar
incubator-graphar copied to clipboard
[Feat] Support multi-labels for a single vertex/edge
Is your feature request related to a problem? Please describe.
For the graphs in Neo4j or NebulaGraph, a single vertex or edge can have multiple labels. For example, a vertex in Neo4j graph could be labeled as a person as well as a student, thus it has two labels: person
and student
. While currently, in GraphAr, a vertex or an edge can have only one label. GraphAr needs to support multi-labels for aligning with Neo4j and Nebula.
Describe the solution you'd like
- replace the label definition of GraphAr with
vertex type
/edge type
. and - storing each label as one separate column (together, all labels of a vertex/edge table form a sparse matrix, see the GraphAr paper for more detail)
more detail:
use Parquet
as example, we can storing each label as one separate column and use Run Length encoding as the encoding of label column. When you want to check a vertex is label person or not, just check that encoding is 0 or not in the person
column. It is convenient and fast to use this method to filter vertices of specific label.
Describe alternatives you've considered Storing a label list (which is complex Array type) as a property on vertices/edges.
Prefer to store one label per column for scanning vertex/edge values by a specific label in order to obtain better performance.
@freshyl @KateHed Can you help on this issue?
We seem to be reaching a point where more and more graphs are supporting multiple labels. Amazon Neptune would be another one that has this feature. I think TinkerPop will likely support this feature in the future given that there are so many graphs that feature it.
claim this task🙋🏻♂️