pentagon icon indicating copy to clipboard operation
pentagon copied to clipboard

Composite primary keys

Open sntran opened this issue 2 years ago • 3 comments

There is a case in which a schema does not have its own primary key. Instead, it uses two secondary indices as key. For example:

export const Edge = z.object({
  type: z.string().optional(),
  sourceId: z.string().describe("index"),
  targetId: z.string().describe("index"),
});

Right now pentagon store the edge in two keys ["edges_by_sourceId", sourceId] and ["edges_by_targetId", targetId]. This is not ideal because querying on these relations are not what one would expect. For example, with a Node schema like below:

export const Node = z.object({
  id: z.string().describe("primary, unique"),
  name: z.string().optional(),
});

To query edges starting out from a Node, one would need to define the relation like so:

nodes: {
  schema: Node,
  relations: {
    edges: ["edges_by_sourceId", [Edge], undefined, "sourceId"],
  },
},

Even so, other queries do not seem to work well.

I looked at Prisma, and it looks like they support composite primary keys. This may work nicely with Deno.KV if we use ["edges", sourceId, targetId] as key.

Not sure how to implement such a thing, but it's an idea.

sntran avatar Jun 18 '23 18:06 sntran

Yeah, I guess this sort of goes hand-in-hand with Prisma's implicit Many-to-many relations?

Do I understand correctly that you want to be able to query many .nodes from edges, and many .edges from nodes?

skoshx avatar Jun 18 '23 18:06 skoshx

That looks right. There is additional data for the direction between nodes and edges, but it's a many-to-many relation.

sntran avatar Jun 18 '23 20:06 sntran

Naive question. Why not storing edges directly in a graph way in pentagon?

Storing Many to Many edges as many one to one relations: (I'm ignoring the inverse indexes btw)

//schema layer, in caps the fixed stuff that would be pentagon related and not user related.
//defining the nodes
["SCHEMA","NODE_TYPE","Person"] <> null
["SCHEMA","NODE_TYPE", "Book"]<> null
["SCHEMA","RELATION_ROLES","authorship"]<>["writer","book"] //
["SCHEMA","ROLE_PLAYERS","authorship", "writer"<> "Person"
["SCHEMA","ROLE_PLAYERS","authorship", "book"<> "Book"

//data layer

["EDGES","authorship", "1","writer"] <> "person1" (could be a key assigned to a composite key)
["EDGES","authorship", "1","book"] <> "book1" 

["EDGES","authorship", "2","writer"] <> "person1"
["EDGES","authorship", "2","book"] <> "book2"

["EDGES","authorship", "3","writer"] <> "person2"
["EDGES","authorship", "2","book"] <> "book1"

This structure enables relations with more than 2 roles, and polymorphism for cases that require it.

In its most basic version, it does not require naming the relation in the schema, neither the roles or the players. Because in "direct relations" (2 roles 2 players) between two nodeTypes, the roles can be named as "to" and "from" by default, and the relationship can be defined as the combination of the keys of the player nodeTypes and a number for the number of relations between those two nodeTypes. (This in case you create multiple relations, for instance person(as author)-book and person(as reviewer)-book. So Peson-Book would not be enough to know which relation you're targeting

["EDGES","Person-Book-1","1","from"]<>"person1"

In the meta-case of defining nodes and edges you would proably not even need that schema as it would be already a graph.

But you could build your custom Edge and node entities by creating the two relations (one per edge type). This will be a bit weird to read as we are using nodes and edges to define the entityTypes "NODES" and "EDGES".

Get ready for the inception time:

Schema layer

//entities
["SCHEMA","NODE_TYPE","Node"] <> null
["SCHEMA","NODE_TYPE","Edge"] <> null

//relations
["SCHEMA","RELATION_ROLES","incoming"]<>["source","target"]
["SCHEMA","ROLE_PLAYERS","incoming", "edge"<> "Edge"
["SCHEMA","ROLE_PLAYERS","incoming", "node"<> "Node"

["SCHEMA","RELATION_ROLES","outgoing"]<>["source","target"]
["SCHEMA","ROLE_PLAYERS","outgoing", "edge"<> "Edge"
["SCHEMA","ROLE_PLAYERS","outgoing", "node"<> "Node"

And the stored data would look something like this:

const edge1 = {id: "edge1", type: "main", sourceId: "node1", targetId:"node2"}

//edges
["EDGES","incoming","1","edge"] <> "edge1"
["EDGES","incoming","1","node"] <> "node1"

["EDGES","outgoing","1","edge"] <> "edge1"
["EDGES","outgoing","1","node"] <> "node2"

But again, if pentagon becomes a bit more "graphy" you will probably not even need to define Edge and Node yourself

lveillard avatar Jul 17 '23 00:07 lveillard