Property Graphs
This deserves an issue to itself given the growing popularity of property graph databases and the opportunity for using RDF as an interchange framework between different databases. See also #20 Standardized n-ary relations (and property graphs) and #22 Language-tagged strings.
Property Graphs are a kind of graphs consisting of nodes and links between them where nodes and links may be associated with a set of property-value pairs, where the values may themselves be sets of property-values and so forth recursively. The link predicate or label can itself be treated as a kind of property.
It is possible to represent property graphs with reification, but that adds considerable complexity. We can easily annotate a node using a link to another node. However, we also need a way to link from a link or to a link. One approach is for each link to expose an identifier enabling the link to be treated as equivalent to an RDF blank node. Such identifiers are okay for links within the same graph and can be implicit in serialisations like Turtle* where a pair of curly braces implies a new identifier.
What if you want to make a link something that can be referenced stably from other graphs? That suggests the need for a means to associate the link with a named anchor that is unique within the graph. What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link? The answer would seem to be the graph that the link was defined in.
Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for. This allows you to make statements about a graph as a whole rather than one of its nodes or links. It would be desirable to quickly determine that a node indeed stands for a graph so as to avoid having to find this out by trying to deference the node.
Yet another challenge is where you want to distinguish properties from other kinds of links. This would allow for visualisations where you can hide and reveal properties with a tabular presentation of property-value sets. See #37 Lack of RDF Visualisation Software.
It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.
I also think support for property graphs is very important. However, my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case, so that we do not need a separate mechanism. So far I have not seen any big barriers to such an approach.
My 2 cents on some of your questions:
It is possible to represent property graphs with reification, but that adds considerable complexity.
Agreed. And I find myself recoiling in horror at the mere mention of reification. In my view, RDF reification should be deprecated, since named graphs are generally much better, though not needed for property graphs.
What if you want to make a link something that can be referenced stably from other graphs?
Then a URI should be used, consistent with existing RDF practice.
What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link?
Although that could be done in existing TriG (for example) I do not think it should be supported in a new higher-level RDF language. I think an RDF molecule that represents an n-ary relation should exist entirely in each graph where it is used, and should be considered malformed if one tries to put part of it in one graph and part in another. The reason is that the user, by creating it as an n-ary relation, intended it to be treated as a single unit. However, there would be nothing wrong with asserting some new triples or a new n-ary relation that makes use of some of the constituents of another n-ary relation.
Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for. This allows you to make statements about a graph as a whole rather than one of its nodes or links. It would be desirable to quickly determine that a node indeed stands for a graph so as to avoid having to find this out by trying to deference the node.
My gut feeling is that that should be done by attaching additional metadata triples to the graph URI, such as provenance.
Yet another challenge is where you want to distinguish properties from other kinds of links.
Yes. My assumption is that by coming up with a standard way to define n-ary relations, this ability will fall out as a natural consequence: a particular group of triples will be automatically identifiable as an n-ary relation comprised of those properties.
It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.
Interesting idea! I wonder how the scope could be known, so that the interpretation would be stable in the face of changing data. It would be bad if x.foo were to select one property against one set of data, but a different property if more data were added. Anyone have thoughts on how this could be done?
It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.
Interesting idea! I wonder how the scope could be known, so that the interpretation would be stable in the face of changing data. It would be bad if x.foo were to select one property against one set of data, but a different property if more data were added. Anyone have thoughts on how this could be done?
I think that is tied to the cardinality of the property, i.e. whether "foo" is constrained to a singular value or can have multiple values (via multiple links with the same subject and predicate). Following the given path may thus return a set of nodes containing zero, one or multiple nodes. When we look at how to model n-ary chunks, we should also look at associated metadata including cardinality constraints, composite keys and so forth. What metadata would make data and rules easier to use by the vast majority of developers?
Path following is related to regular expressions and RDF shapes, as well as to XPath for XML. I've explored it in some experiments inspired by ATNs, see https://www.w3.org/WoT/demos/shrl/test.html
p.s. I am using the term chunk as it is popular in Cognitive Science and features prominently in cognitive architectures like CMU's ACT-R.
If someone wrote x.foo as a path, using short names, then I assume that each corresponding long name would be comprised of a namespace plus the short name. How would the system know which namespace to prepend to the short name? For example, if the current namespaces included both http://example/a# and http://example/b#, how would the system know whether foo should be expanded to http://example/a#foo or http://example/b#foo? Or do you envision this working some other way?
I assume that each corresponding long name would be comprised of a namespace plus the short name
No, that isn't the case. This is just a graph of objects where the object properties act as links to other objects, and each object property has a name that is scoped to that object. In RDF terms, the subject node + the property name provides a map to a predicate, and uniquely identifies a set of triples with that subject and predicate.
A restriction on this would be to constrain property names to uniquely identify predicates in this graph. This is tantamount to saying that the property name uniquely identifies the meaning of a property, rather than this being something specific to each object.
That is an overly strong constraint as in the real world, words are often used for different meanings depending on the context. However, there is nothing to prevent implementations from optimising how they handle this internally.
I would like to pursue the possibility of encoding property graphs in standard RDF. Have others already done this? If so, what RDF patterns were used, and what limitations did they have?
Apart from reification, one approach that has been mentioned is to use a named graph that contains just the triple you want to annotate. This generalises to annotations on multiple triples, but I am unsure how you indicate that a given triple is in multiple named graphs. Another challenge is how you identify a graph when there isn't an explicit name for it, e.g. when using curly braces in Turtle* around the triples you want to annotate, this would imply an implicit blank node for the associated graph.
This makes me think about how to deal with graphs from an implementation perspective. One idea is to express the relationship between a triple and a graph is as a property of the triple, where the property can have multiple values. Another idea is to allow for relationships between graphs, e.g. for one graph to be subsumed as part of another graph. A database could create its internal identifiers, and associate them with external identifiers when those are defined.
I wonder how this is dealt with by existing property graph database solutions?
I am unsure how you indicate that a given triple is in multiple named graphs.
You make several quads having the same <s,p,o>.
Property Graphs are a kind of graphs consisting of nodes and links between them where nodes and links may be associated with a set of property-value pairs, where the values may themselves be sets of property-values and so forth recursively.
The part in bold is not true. Node and Link (respectively Vertex and Edges) properties are plain old hashmap, JSObject or dict.
The link predicate or label can itself be treated as a kind of property.
Yes.
It is possible to represent property graphs with reification, but that adds considerable complexity.
What reification? I looked up around I still don't understand.
What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link?
That is exactly what I meant about "it is advanced use" in the this comment.
Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for.
I think we should come up with a representation of a property graph before trying to generalise to recursive or hierarchical graph or "meta-graph".
True story: as part of a foolish tentative to replace the atomspace, I was thinking about how to implement this kind of things. Basically a single entity called the atom that has outgoing and incoming links and properties as a hashmap. Then came up the idea of "recursive hyper graph". Like you wrote, it is complex to just to imagine a node (or atom in my case) pointing outside its own graph. Like you wrote, having a node represent another graph or sub-graph (because it is hierarchical, it make sens). Again, I think it is the role of the reasoner / rule engine to deal with that kind of complexity. As part of my exploration, I tried to implement but in the end, there really no way to "make it fast" and a priori, you don't know when the "query" will end.
my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case
what is "n-ary relations" please?
It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages.
That is what Gremlink (from Thinkerpop) mostly does. It is written like
graph.vertices.filter(lambda x: x.type == 'actor').outgoing.filter(lambda x: s.genre 'science-fiction')
I have written a some time ago an article on how to build a graph database on top of EAV. You can find it at https://hyper.dev/blog/diy-graph-database-in-python.html.
EAV is somewhat like a triplestore but you can not have multiple triples with the same subject, predicate. On top that abstraction, I built a document store by grouping by subject. Each document has a private field that allows to distinguish node from edges. Also edges have two other private predicates node-start and node-end.
Hello, I'd like to pick up this topic and discuss a specific question: How can you distinguish a property from a relation ?
In RDF that is not possible, because there is no such distinction. Example:
<Alice> <knows> <Bob> .
and
<Alice> <mbox> <mailto:[email protected]> .
are completely equal in the sense that they are simple statements. But the meaning is very different because Alice and Bob are persons, they are entities i.e. they are things (resources) which have distinct existence. The second statement states that Alice has a property, i.e. contact address. Although the mailbox is an URI it has the flavor of a value, like
<Alice> <name> "Alice".
Of course, you can say that a mailbox is also an entity but wether or not something is an entity is a decision made by the domain model. I think this is the crucial question when you want to bring Property Graph and RDF together!
I also want point out, that Property Graph is a technical way to do ER modelling. Nodes become entities, edges become relations a key-value pairs become attributes==properties.
My vision is to create a unified graph model the embraces ER-modelling and RDF at once.
In RDF you distinguish between URI resources as objects or datatype resources as objects.
In this case <mailto:[email protected]> is a URI resource which can have its own triples. "[email protected]" would be a literal.
Absolute terms like "not possible" do not help IMO because while it may seem so to you coming from a different background, there are very good reasons why RDF is like it is, and formal theory behind them. RDF was designed for data interchange.
Are you familiar with RDF-star?
Sure, I know all that, and I am familiar with RDF*. I completely understand why RDF is designed like it is. The question is not so technical, it is more a theoretical.
Of course you can write
<Alice> <mbox> "[email protected]"^^xsd:anyURI
We can agree that a literal is property value. But it can be more difficult than that:
<Product1> <price> [ <value> 300 ; <currency> <euro> ]
What about <euro> ? Can it be an entity? If so, then <currency> would be a relation, because a relation is between entities not properties and entities. But is the blank also an entity? Then the whole price would be an entity.
Entity is not an RDF term. If we're talking ontological modeling, a related term would be class.
Sure you can call the price entity, and the euro as well. Why is that a problem?
After re-reading some of this thread, I notice that I missed a couple of questions from @amirouche a couple years ago. Sorry!
What reification? I looked up around I still don't understand.
See this brief explanation and this answer on stackoverflow.
what is "n-ary relations" please?
See Defining N-ary Relations on the Semantic Web.
And addressing newer comments from @mhedenus :
How can you distinguish a property from a relation ?
Can you please first explain what distinction you are trying to make between a "property" and a "relation"? AFAIK we do not have widely accepted standard definitions of those terms that clearly distinguish between them. If you could explain what distinction you are trying to make, it would be helpful.
Also, please explain what you mean by "entity", and why you think some things should be considered entities and some should not. When you wrote "they are entities i.e. they are things (resources) which have distinct existence" it sounds like you are using the term "entity" to mean what RDF calls a "resource". But then when you suggest that some things should be considered entities and some should not, that sounds different than the RDF notion of "resource", so I am confused. Can you explain what you mean by "entity" and how it is different from what RDF calls a "resource"?
Maybe I should clairfy what it is all about. I am a advocate of RDF since I learned about it 20 years ago (I do programming since 1988 and Java development since 2000). I worked very hard to establish RDF as technology in my company in the automotive industry. Currently, we use RDF primarily for data integration.
But can you use RDF for modelling, e.g. using RDFS or OWL ? I think not.
The reality is: modelling is hard especially because domain experts are normally not software developers. When you start talking about URIs, resources and stuff they only understand blah blah blah. I believe the main reason why the adoption of RDF is so scarce (YES IT IS! There are still too many peaple out there who have never heared about it) is because people don't get it! RDF is extremely academic!
What people understand (even mechanical engineers) is ER modelling. They understand that there are things (entities or objects) which have properties (or attributes) and they have relations to other things.
Let's make a (over-)simplification here: there are two main graph modelling worlds:
- Property Graph == ER Modelling == Domain Modelling == {entities, relations, properties)
- RDF == {resources, predicates, literals}
Can these worlds be brought together? Yes. We have developed a graph model that is a Property Graph compatible with RDF. It is working and I believe that it can give benefits to the RDF world.
RDF is talking about resources. Everthing that can be identified with an URI is a resource. An email-address is a resource. If you use an email-address to denote a person you write (like proposed by FOAF)
<mailto:[email protected]> a <Person> ; <name> "Alice"
So far so good. Now let's express the fact that "Alice has a email address":
<mailto:[email protected]> <mbox> <mailto:[email protected]>
That is legal in RDF and it makes complete sense in RDF. But these statements have different meanings which are only obvious to human readers. In the first statement the mailto URI is an identifier for something we call Alice, in the second statement the same URI is a value that belongs to a property owned by Alice.
Do you agree?
You have used the same URI, <mailto:[email protected]>, to denote both a person and a mailbox. That is a URI collision. According to the Web Architecture, you should have used different URIs, to avoid the problem that you're raising. RDF itself does not stop you from doing that, but that doesn't mean it's a good practice either.
But what does this have to do with implementing property graphs in RDF? I don't understand where you're going with this example.
Yes, this URI collision is not nice, and should be avoided. This example should demonstrate what I think to be the stumble when you try to implement property graph in RDF. When you try to map RDF to property graph you have to know wether the statement's object is another node (and therefore the predicate a relation) or a property value (and therefore the predicate a property type or key).
To say all URIs are mapped to nodes in the property graph and ONLY statements with literals are properties would be an artifical restriction.
To solve this some addtional information is required that tells you which predicates are considered to be relations and which predicates are considered to be properties.
I still don't get why developer's unfamiliarity with a technology is being framed as defficiency of the technology, and not the developer. This seems to be a constant theme for EasierRDF.
Many more developers know Javascript than C++. Does that make C++ academic, and by that somehow defficient? Should we have EasierC++?
If developers are familiar with ER or UML or whatever, then provide mappings/converters to OWL/RDF(S). But don't use that as an opportunity to knock RDF.
@namedgraph I think I disagree with you fairly fundamentally about this. I think lack of uptake can be an important indicator that a technology is too hard to use. It certainly is not an absolute determinant though.
If you look at market shares, RDF databases are getting clobbered by property graph databases. You can claim that RDF does more than what Property Graphs can do -- and I agree -- but it isn't a huge difference, and apparently it isn't a difference that matters to many common use cases.
I want to improve RDF, not knock it. And that means being honest about its strengths and weaknesses. IMO its biggest weakness is its difficulty of use. If we can make it as easy to use as Property Graphs -- at least for use cases that do not need functionality beyond Property Graphs -- then I think that would be very beneficial for RDF. But as I said before "my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case, so that we do not need a separate mechanism".
@dbooth-boston we've been over this...
I'd like you to try the C++ analogy though. StackOverflow is full of questions "why is C++ so hard?" and yet some of the most critical software is written in it. How is this different from RDF?
This is a bit off topic, but I'll indulge your C++ analogy and try to answer. I think you are suggesting that, even though RDF is hard, it is still the right tool for the job sometimes, just as C++ is the still right tool for the job sometimes, even though it is hard. I definitely agree that RDF is sometimes the right tool for the job. (I would not have been involved with RDF for so many years if I didn't!)
But here is where I think the analogy breaks down. When C++ is chosen, almost invariably the overriding reason is for performance. I don't believe anybody would choose C++ over Python (for example), if performance were not a key consideration. And the reason C++ is hard is because it is both a low-level C-compatible programming language and a high-level object-oriented programming language. When performance is critical, there is no getting around the need for a low-level language like C. One could of course use C instead of C++, but the higher-level features of C++ allow for more programmer productivity while still giving access to the low-level features of C. In other words, programmers put up with C++'s difficulty because they NEED have the low-level features that it provides.
In contrast, I do not believe that RDF is chosen because developers really NEED the low-level features that it provides. I believe we can produce a higher-level successor to RDF, that retains the power that we need, while making it easier to use.
As a case in point, I do not believe that we really NEED explicit blank nodes in RDF, i.e., blank nodes like _:b42 that cannot be represented by square brackets [] in Turtle. We could solve the same use cases if RDF did not have them, even though we might have to create a few Skolem URIs instead sometimes. Yet that one little feature -- the ability to write an explicit blank node -- places a disproportionate complexity burden on RDF users. Not only does that feature cause endless confusion to new RDF users (because blank node labels are not stable identifiers), but it is precisely the reason why, after over 20 years, we still do not have a standard way to canonicalize RDF!
In short, the low-level features of C++ are essential to its users, but the low-level features of RDF are not essential. They only continue to exist because we have not yet developed a higher-level, easier-to-use successor.
Unless we succeed in making RDF considerably easier to use, I think RDF will eventually get squeezed out of the picture entirely, in favor of other graph approaches that are easier to use, even though those other graph approaches are not quite as powerful.
Dear @dbooth-boston and @namedgraph, the discussion is going into the wrong direction. I never intended to knock anybody or to start a fruitless disucssion about what is better. When I say that RDF is academic, this is no bad thing (beeing an academic myself) and I do not want to qualify anything. It's a simple truth that being brilliant is not all you need to be successful.
This is a thread about Property Graphs and RDF? Well then let's go back on track what I consider a valid conceptual question. The main difference between RDF and Property Graph is not that it is not easy in RDF to assign predicates to a predicate. This can be done with reification. The main difference lies in the core graph model itself.
I painted a quick picture. Let's assume you have a domain model consisting of two types of things A and B. You can create a UML class diagram or a ER diagram. Both are modelling the same situation.
Now consider an instance graph that contains the data modelled by the UML diagrams. The property graph is very straight forward. The RDF graph is more complex due to the atomic nature of its nodes. The type and member1 of the things of the model become nodes in the RDF graph.
Now imagine an implementation that has to map the model elements (ER, class) and instance-graphs (PG, RDF). The mapping to RDF is more complicated: you must distingiush properties or attributes like the type and member1 from the relation or association relatesTo. They all become predicates in RDF. So mapping back from RDF to the model becomes somewhat ambiguous.
Now the question: if you are looking at the RDF graph, how can it be possible to see that member1 is meant to be a property/member/attribute and relatesTo is meant to be an association/relation ?
Please don't answer: you don't do that, you use RDFS or OWL! This is a conceptual question, not a technical.
One part of the answer surely is: If the subject of the RDF statement is an URI and the object of the RDF statement is a literal then the predicate is meant to be a property/attribute.
But is this sufficient? What about rdf:type? Are there special cases? Is some special information required? If so, how it is provided? Some special model, ontology... ?

To put it more mathematically: The model-graphs and instance-property-graph are homomorphic, the RDF graph is not.
For the record, the W3C Cognitive AI Community Group is incubating a higher level approach to knowledge graphs that is easier to work with than JSON-LD whilst retaining mapping to RDF triples. It focuses on "chunks" as a collection of properties whose values are literals, references to other chunks or a sequence thereof. Your example becomes:
A c1 {member1 value1}
B c2 {}
c1 relatesTo c2
Chunks map to one or more RDF triples with a shared subject node. Chunk types and properties can be easily mapped to RDF URIs in a similar manner to JSON-LD. Knowledge engineers need to decide when to model something as a property or an explicit link. However, it is easy to promote a property to a link when needed.
Chunks has a broader scope than either RDF or Property Graphs, as it seeks to support general purpose human-like AI, inspired by progress in the cognitive sciences and over 500 million years of neural evolution. Chunks are associated with sub-symbolic parameters that model human memory in respect to prior knowledge and past experience. The chunks rule language models the cortico-basal ganglia circuit in the brain. Sequential rule execution corresponds to the sequential nature of consciousness, and draws upon decades of work by John Anderson at CMU. By contrast, chunk databases support parallel execution of graph algorithms.
You would be right in thinking this work is still in its infancy, but general purpose AI will be hugely disruptive, and both RDF and Property Graphs will be effected by the rise of machine learning. To quote William Gibson "The future is already here – it's just not evenly distributed"
That is very exciting, maybe an AI can recognize it.
Knowledge engineers need to decide when to model something as a property or an explicit link.
That's getting to my point. How to do you see in the RDF graph wether it is a property or link ?
@mhedenus and @draggett I agree with the general direction that you describe, which I see as the ability to manipulate small subgraphs of RDF as though they are single, indivisible objects (or "chunks" as @draggett calls them) -- making RDF a higher level language. I think three fundamental capabilities are needed to achieve this:
- Manipulate those chunks as indivisible objects;
- Compose a chunk from a subgraph -- i.e., "bless" it, to enable that subgraph to be manipulated as an indivisible chunk; and
- Decompose a chunk into its subgraph, i.e., access its parts.
In programming languages we do this routinely with data structures, but we don't (yet) have this capability in RDF.
That. is. a. giant. insightful thread.
Property graphs have the advantage that they require no particular knowledge to be understood or used, it is easy to draw, easy to explain. In particular, the API is easy without much diversity or even room for creativity. The only roadblock in my opinion is GQL, inspired from Cypher, but that might be particular bias of mine because I prefer high-level and specific domain languages. At intermediate levels, so far, I prefer procedural approaches.
Something that may seem natural to the seasoned software practitioner, might not be for the newbie. I think about trees as an example of deep topic in software, even if it has natural representation. The opposite also exists: tables. Tables are not represented in the natural environment, but given the massive education, and tooling, it succeed as concept and new common tool.
It was already written in this thread using other words: a property graph as a programming interface is less powerful (weaker) than RDF. Another way to explain the same idea, all things considered: RDF can implement property graphs, what RDF allows to describe is a superset of what a property graph allows to describe. To be able to tell whether one is better than the other as a solution for building a particular product is another problem (must dive into performance, culture, existing knowledge features).
Indeed the property graph model is easy to the mind, even more so thanks to UML, ER diagrams, etc. That does not necessarily mean I will use a property graph database for the implementation. A graphdb is a tool for the non-practitioners, and help software practitioners to deliver more quickly.
What matters the most is the conceptual tools that can be invented, the new thought that can be had. In this regard, property graph did not help me: it is a direct mapping of existing, almost natural knowledge. Unlike RDF, that provides, in my opinion and experience, tools and grounds to create and think about new ideas.
- Manipulate those chunks as indivisible objects;
- Compose a chunk from a subgraph -- i.e., "bless" it, to enable that subgraph to be manipulated as an indivisible chunk; and
- Decompose a chunk into its subgraph, i.e., access its parts.
That is exactly the description of a project I built inspired from the atomspace. In my implementation atoms had key-value pairs (properties), zero or more incoming and outgoing links (links have no label or properties). subgraph were reified with an atom and links toward the atoms composing the subgraph. Like I commented in recursive / hierarchical RDF graph issue, that is very difficult to reason about it. Maybe cogai chunks will make it practical.
Wow, atomspace has some really interesting ideas! I hadn't looked at it before.
RDF can implement property graphs, what RDF allows to describe is a superset of what a property graph allows to describe.
Well, I think that is currently not completely true, because of the issue I layed out: RDF does not provide the fundamental distinction of relation and property of the common modelling schemes.
The solution we implemented is simple: when reading RDF you must specify a model that tells you how to interpret the predicates. The question remains what to do with predicates that are not specified in the model. The rules are:
- if the subject is a URI and the object is a URI the predicate is a relation unless specified otherwise
- if the object is a blank node the predicate is a property and the object is a complex property value (a structure)
- if the subject is a blank node the predicate is a (sub-)property of the complex property value (a structure)
Now you have a clean mapping to a ER/class model: the URIs which are subjects belong to entities (domain model objects), predicates between entity-URIs are always relations and there is no relation from within a complex property that points to another entity. That means in terms of ER modelling: There is only a relationship between Alice and Bob but not between any part of them. One interesting thing is that you can decide post facto wether to interpret a predicate as relation or property.
This "blank-node-cluster" is like the "chunk" mentioned by @draggett ?