pygraphistry icon indicating copy to clipboard operation
pygraphistry copied to clipboard

Schema: Entity and relation taxonomy

Open lmeyerov opened this issue 2 years ago • 0 comments

Typed property graphs often have some sort of taxonomy, where nodes or relations with a certain label typically have specific fields, and those fields typically take on specific data representations

For example, Person nodes in an identity graph might look like:

class Person:
  name: str
  age: int
  addresses: List[str]
  alias: Optional[str]

Primitive types should match the database native schema, or even better, Apache Arrow types

Note the use of List, and if heterogeneity exists for a type, Optional, Union, etc

In cypher, that might correspond to (a: Person { name, age, addresses })

Something similar might be true of a relationship, like friend:

class IsFriends:
  first_met: datetime64[ms]
  • We should get the edge & node ontology in a machine-understandable format, e.g.,
class EntityRepresentation:
   attributes: Dict[str, FieldType]

PrimitiveTypes = Literal['str', 'int', ...]
CompoundTypes = Tuple[ Literal["Optional", "Union", "List"],  Union[PrimitiveTypes, 'CompoundTypes'] ]
FieldType = CompoundTypes


node_taxonomy : Dict[str, EntityRepresentation]
edge_taxonomy : Dict[str, EntityRepresentation]

There may be more convenient representations as well

  • Databases may have 1B+ elements, so inference should still work there

lmeyerov avatar May 13 '23 02:05 lmeyerov