kuzu Support for Multiple Labels for nodes and edges

Support for Multiple Labels for nodes and edges

Discussed in https://github.com/kuzudb/kuzu/discussions/3114

with: @mnsRG @semihsalihoglu-uw @prrao87 @hpvd

Mar 22 '24 15:03 hpvd

if you are interested on this => plz comment /ad +1

Mar 22 '24 16:03 hpvd

if you have any ideas on

requirements /
use cases or
performant implementation

=> plz share!!

Mar 22 '24 16:03 hpvd

some use cases for additional/multi labels:

separating graph parts (subgraphs)
help with sub indexes
manage access (rely on label in separate app)
store vector embeddings for similarity search of nodes and edges
store last change date
store previous name version
tag with technical/organizational domain
often needed pre calculation results: e.g. number of connected nodes
...

Mar 22 '24 17:03 hpvd

since I have fully understood from discussion in https://github.com/kuzudb/kuzu/discussions/3114 that this is hard to implement in a good way... Thinking as an engineer, what do you think of using some kind of internal work around, by storing multiple labels within one label (list) but make them accessible like if they were separate labels? e.g. via match (n) where n:shirt and n:red return n;

of course from performance pov this would not be great, but one could offer this functionality. If one, one day, have the idea one is searching for, one can refactor the workaround to native implementation....

Apr 04 '24 20:04 hpvd

Hi @hpvd: I don't think we should provide a specialized feature for multilabels without having a good foundation for it. What you are suggesting is equivalent to storing a "labels: List[STRING]" property on nodes and checking WHERE list_contains(n.labels, "red") AND list_contains(n.labels, "shirt"). So it should be done manually. It is also not very easy to do at the system level even if it may look so. We are changing the data model, so we need to have a notion of multiple labels in te catalog, binder, grammar, compiler, all our data ingestion methods need to be aware that nodes can have multiple labels etc. To justify all changes, I would want to have a more principled solution than a work around.

That said I am happy to provide a documentation page to describe how to manually implement nodes having multiple labels using this specific manual work around, i.e., of having a labels: LIST[String] property.

Apr 05 '24 03:04 semihsalihoglu-uw

Hi there, I’m very interested in multi labels too. I found it a very elegant way of modelling data since it provides “layers” that you can stack on top of your data. It’s very flexible and you don’t have to move or copy your data to another place if you want say, an other approach to your data or different business needs. You don't have to treat your labels as the only definition of your nodes, so it's closer to reality. I’m looking for a FLOSS “graph alternative to sqlite” or maybe “tiny embedded neo4j” for a long time now so when I saw tags like “Cypher” and “embedded” I thought it could be Kuzu. I really think there is a place for such a local embedded graph solution. But to me this missing feature of multi labels is kinda a bummer since I designed all my current project around this concept of layers. And apart neo4j, redis graph/falkordb and Neptune it’s the same thing for quite a lot of graph databases actually. And while many of them claim being way faster than neo4j, they don’t provide the same feature set so you know, a bit of apple and oranges. But if you have any idea to convert multi-labels data modeling to single label data modeling, that might be great :D

I mean sure, you may find a work around application side. Question is, how much it cost versus native implementation. I guess in the end there won’t be a magic cost free solution, you need some sort of labels indexing somewhere if you want make them first class citizen in your modelling.

Naive solution includes like you said before a list of labels in property (but then you got to go in nodes properties each time when doing graph traversal) or I don’t know, make a copy in each labels table/collection (ugly). My own (if your implementation of a graph is really fast in dealing with edges) is to have a special type of nodes (label) for... others labels. And then you link your nodes to relevant labels via a « labelled_by » edge. It’s still very much an ugly work around I guess and you better have a fast lookup (like 0(1) / hashtable kind of fast) because this table of labels will be huge.

You can also have a concept of “main label” and additional less efficient ones. If I remember correctly, in neo4j, labels are stored along nodes but only first 4-5 ones. I don’t know about redisgraph solution but there is a special branch for it to investigate at https://github.com/RedisGraph/RedisGraph/tree/multi-label-node

I hope a solution can be found here, your project sound really great

Cheers

Apr 05 '24 07:04 celorn

Hi @celorn, many thanks for describing your use case of multi label. As described, I also think its pretty useful to have this feature inside. Of course a description how to implement it manually outside as offered by @semihsalihoglu-uw would also be a good (first) step! Just to comment on your link to redisgraph:

redisgraph has implemented multi label for nodes since 2.8 see https://github.com/RedisGraph/RedisGraph/releases/tag/v2.8.8
falkordb is the successor of redisgraph (with same core team, see https://news.ycombinator.com/item?id=37104193)

Apr 05 '24 07:04 hpvd

kuzu kuzu copied to clipboard

Support for Multiple Labels for nodes and edges

Discussed in https://github.com/kuzudb/kuzu/discussions/3114

kuzu
kuzu copied to clipboard