dstlr
dstlr copied to clipboard
Refactoring relation and fact schema
Currently, our schema for relations and facts looks something like this:
There's an asymmetry here, as relations are reified with an explicit relation node. We should refactor to make more consistent.
(also, to me, object_of relation has the directionality reversed)
In introducing an intermediate relation node for the ground-truth, do we want the type to be CITY_OF_HEADQUARTERS (CoreNLP) or P159 (Wikidata)?
An argument for CITY_OF_HEADQUARTERS is that the queries are cleaner as we can match on nodes of the same type, but against would be that we lose the Wikidata property information (does this even matter?).
An argument for P159 is that we maintain the Wikidata property and can map back and see where it came from, but the queries are messier because we need to know, and include, the mapping between CoreNLP <-> Wikidata.
I'm leaning to P159. This leaves open the possibility that a relation might not align perfectly with a fact property, so we can't do this mapping up front.
So, just to be concrete, the tweak we are suggesting is to take fact (Q355 "Facebook", hq, Q74195 "Menlo Park") and create:
- (Q355, subject-of, FACT[type:hq])
- (Q74195, object-of, FACT[type:hq])
This also allows the mention "Menlo Park" in text to be linked to Q74195.
And furthermore, I would change to has-subject and has-object to make sure that it is obvious that the FACT or RELATION should be in the first place in the triple.