json-joy Peritext for JSON CRDT

[ ] Implement peritext or peristring text type, see Peritext post and paper.

Block element implementation is described here.

Nov 27 '22 18:11 streamich

Peritext proposes multiple new operations:

Inline
- addMark
- removeMark
Block
- splitBlock
- joinBlock
- updateBlock

Consider if less operations can be used.

As, inline elements are not actual RGA elements, maybe a new concept needs to be introduced for them.
All block operations can likely be represented by a single invisible "marker" element which links to another node in the document.

Possibly both scenarios can be expressed by a single "slice" concept (analogous to Peritext markers, but which link to another node in the document). If the slice starts and ends at the same spot, it degrades to a "marker"? A block splitBlock could be represented by an invisible or deleted character to which a slice is attached (start end end of the slice map to the same character).

Nov 27 '22 22:11 streamich

Come up with a good name for the "slices":

Interval
Segment
Range
Slice (the whole concept could be called "slice", but the payload could be called "mark")
Mark
Annotation
Markup
Span

Come up with a good name for the "payload" of a "slice":

Payload
Mark
Marker
Value
Decorator
Decoration
Annotation
Tag

A deletion of a slice could be supported. Deletion is marking the slice with a tombstone. Rather than creating a separate operation for it, the tombstone could be contained in the mark/payload. If payload is set to undefined, that means the slice has been deleted? More complications arise from the type of the mark:

If the mark is a LWW-Register, then we can always set its value to undefined.
If the mark is a Const node, then its value can never be changed, hence it can never be deleted.

In general, likely there is no necessity to support "deletion", i.e. ability to add tombstones on the RGA level. Instead, maybe the rich-text layer (Peritext) can mark ranges/slices with tombstones, if needed.

Some text annotations are very small, so it might not be beneficial to have the ability to delete them. For example, making text bold could be achieved by a slice, which has a single byte payload (say, some integer, which represents bold formatting)—in that case, deleting the slice will not be much more efficient that adding another overlay slice which reverses the bold formatting. However, it probably still would be useful to make the RGA level aware of the deleted slices, so, once a slice is deleted, it does not reveal it anymore to the higher level.

Represent the slices as a Grow-only-Set of the following tuples:

Text annotations set:
slice1 = (id1, start1, end1, value1)
slice2 = (id2, start2, end2, value2)
...

Where each start and end element is composed of the range boundary ID as well as a flag, which specifies if the if the edge is inclusive or exclusive:

start = (id, isInclusive)

Nov 28 '22 10:11 streamich

We also need to be able to represent the virtual text start and end elements.

Nov 28 '22 10:11 streamich

Taxonomy:

Edge = (CharacterId, IsInclusive)
Interval = (StartEdge, EndEdge)
Slice = (SliceId, Interval, Value)

Marker (e.g. a splitBlock boundary) is when interval is empty (collapsed), e.g. StartEdge = EndEdge:

Marker = (SliceId, Interval = StartEdge, Value)

The contents of the element of the marker can be anything.

Nov 28 '22 10:11 streamich

Interval consists of two "boundaries":

Boundaries
Edges
Limits
Endpoints (used in math)
Endelements

Boundaries are linked to an:

Anchor
Handle

One way to represent an edge is by a 2-tuple [element: ID, isInclusive: boolean], however then isInclusive does not describe well whether it is the "before" or "after" anchor, as defined in Peritext. Basically a boolean is not sufficient to describe the anchor point, if the boundary is used standalone (without the context of the interval).

Hence, it is better to call "boundaries" as "endpoints", and define them as:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];

Nov 28 '22 10:11 streamich

Taxonomy 2.0:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, value: ID];
type SliceSet = Slice[];

How to represent the "start" and "end of the RGA sequence?

const startEndpoint = [strNodeId, Anchor.After];
const endEndpoint = [strNodeId, Anchor.Before];

alternatively

const startEndpoint = [undefinedId, Anchor.After];
const endEndpoint = [undefinedId, Anchor.Before];

Nov 28 '22 11:11 streamich

Efficient rich text representation:

type LocalSlice = [id: ID, start: number, length: number, value: unknown];

interface Peritext {
  text: Rope;
  slices: LocalSlice[];
}

interface Rope {
  str(start: number, length: number): string;
  interval(start: number, length: number): Rope;
}

Nov 28 '22 11:11 streamich

Taxonomy 3.0:

Optional tag member.

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, tag?: ID];
type SliceSet = Slice[];

Nov 28 '22 13:11 streamich

Taxonomy 4.0:

Local operation:

enum Anchor { Before, After }
type Endpoint = [index: number, anchor?: Anchor]; // -1 represents the string root element.
type Interval = [start: Endpoint, end?: Endpoint];
type Slice = [interval: Interval, tag?: ID];
type SliceSet = Slice[];

Remote operation:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, tag?: ID];
type SliceSet = Slice[];

Dec 02 '22 09:12 streamich

json-joy json-joy copied to clipboard

Peritext for JSON CRDT

json-joy
json-joy copied to clipboard