pygraphistry
pygraphistry copied to clipboard
Schema: Entity and relation taxonomy
Typed property graphs often have some sort of taxonomy, where nodes or relations with a certain label typically have specific fields, and those fields typically take on specific data representations
For example, Person nodes in an identity graph might look like:
class Person:
name: str
age: int
addresses: List[str]
alias: Optional[str]
Primitive types should match the database native schema, or even better, Apache Arrow types
Note the use of List, and if heterogeneity exists for a type, Optional, Union, etc
In cypher, that might correspond to (a: Person { name, age, addresses })
Something similar might be true of a relationship, like friend:
class IsFriends:
first_met: datetime64[ms]
- We should get the edge & node ontology in a machine-understandable format, e.g.,
class EntityRepresentation:
attributes: Dict[str, FieldType]
PrimitiveTypes = Literal['str', 'int', ...]
CompoundTypes = Tuple[ Literal["Optional", "Union", "List"], Union[PrimitiveTypes, 'CompoundTypes'] ]
FieldType = CompoundTypes
node_taxonomy : Dict[str, EntityRepresentation]
edge_taxonomy : Dict[str, EntityRepresentation]
There may be more convenient representations as well
- Databases may have 1B+ elements, so inference should still work there