typedb icon indicating copy to clipboard operation
typedb copied to clipboard

TypeDB 3.0: Indexed relations

Open flyingsilverfin opened this issue 1 year ago • 0 comments

Problem to Solve

One major performance bottleneck of TypeDB is the structure of relations: to traverse through relation, we always have to go from a player, to the role players we are interested in. However, most of the time we don't actually need to visit a relation, and only pass through it: match ($x, $y) isa relation.

Additionally, most relations are small (< 5 role players). We can combine these properties, along with schema-level cardinality restrictions, to automatically create index-index shortcut through relations that are guaranteed to be small.

Proposed Solution

When a relation type can not have more than a fixed upper bound of role players (we will consider 8 for now), we will automatically index player-player pairs. This is an O(N^2) storage requirement, which is why we only want to do it for small relations. The maximum size of a relation can be derived from the @card (https://github.com/vaticle/typeql/issues/34) annotations on roles.

These indices will be automatically created when relations are inserted/deleted/updated, etc.

When the schema is modified, it is possible that the maximum indexed relation size is breached. This means that the indices should be marked invalid and deleted in the background. Vice versa, if the schema is updated such that relations are shrunk to less than the maximum indexed relation size, we should index all relations in the background and mark the relation type as indexed when complete. In the meantime, any queries and traversals will fall back to the longer traversals through the relation itself. The server should log indexing operations running/completed, and we can eventually add an API in the client to query the database indices status.

Storage size implications

Let's assume an average case of 3 players and worst case of 8. Index shortcuts will be bi-directional. 3 players = 3*2 = 6 index entries 8 = 8 * 7 = 56 index entries

Note that these entries will be relatively large: [player IID] + [player IID] + [Role Type IID] + [Role Type IID] + [Relation IID].

If we compare this this with the virtualisation of role instances, which incur 5 entries per role player, we see: 3 players = 35 role instance entries = 15 small entries 8 players = 85 = 40 small entries

These entries are approximate [player IID] + [Role Type IID] + [Relation IID], so about half the size of the player-player index entries.

We can see that in the average case, the player-player indices are about the same size as the role instances, and in the worst case (8 players) about twice the size, which is an acceptable storage cost.

flyingsilverfin avatar Mar 17 '23 10:03 flyingsilverfin