json-joy icon indicating copy to clipboard operation
json-joy copied to clipboard

JSON CRDT mapping onto SQL-like tables

Open streamich opened this issue 1 year ago • 0 comments

Design a storage format which combines indexed and sidecar formats: store each CRDT node separately and also separate the storage of view and sidecar for each CRDT node.

Storing view of each CRDT node in its own row cell allows user to perform any SQL query on that data.

This way JSON CRDT documents can be mapped onto a SQL table, where any SQL query can be run against the fields which store the node views. Also, it will allow to mutate only columns which are affected by a change.

Unlike the indexed format, the indexed-with-sidecar format needs to index by view location, e.g. /address/zip instead of node ID (a.1).

The SQL-like database may not be able to create new columns for dynamically created JSON CRDT nodes. May need a catch-all blob, which serializes all untracked nodes.


Alternatively come up with a format where multiple JSON CRDT documents share metadata (clock table). This way each row cell is standalone JSON CRDT document sans the clock table, which is shared for the whole row. In this approach, each JSON CRDT document has to also support the sidecar encoding format.

Changes needed:

  1. Take out the clock table part out of the sidecar format.
    • Ability to plug in the clock table when decoding.
    • Ability to encode the clock table standalone (already possible in indexed format).
  2. Each change (set of Patchs) has to also specify the column to which it has to be applied.

The 2. can be done in userspace. Essentially, this approach needs to "just" take out the clock table encoding from the sidecar format, and dependency inject the clock table on-demand.

To map JSON CRDT document nodes to a database row: one would need to construct the document schema using a reserved sid and then provide a mapping from node IDs to database columns.

Subsequent datase schema changes can be further done by changing the document schema using the reserved sid and updating the node-to-column mapping.

The node-to-column mapping can be specified using JSON Expressions.

streamich avatar Jul 28 '24 07:07 streamich