json-joy icon indicating copy to clipboard operation
json-joy copied to clipboard

Peritext for JSON CRDT

Open streamich opened this issue 2 years ago • 9 comments

Block element implementation is described here.

streamich avatar Nov 27 '22 18:11 streamich

Peritext proposes multiple new operations:

  • Inline
    • addMark
    • removeMark
  • Block
    • splitBlock
    • joinBlock
    • updateBlock

Consider if less operations can be used.

  • As, inline elements are not actual RGA elements, maybe a new concept needs to be introduced for them.
  • All block operations can likely be represented by a single invisible "marker" element which links to another node in the document.

Possibly both scenarios can be expressed by a single "slice" concept (analogous to Peritext markers, but which link to another node in the document). If the slice starts and ends at the same spot, it degrades to a "marker"? A block splitBlock could be represented by an invisible or deleted character to which a slice is attached (start end end of the slice map to the same character).

streamich avatar Nov 27 '22 22:11 streamich

Come up with a good name for the "slices":

  • Interval
  • Segment
  • Range
  • Slice (the whole concept could be called "slice", but the payload could be called "mark")
  • Mark
  • Annotation
  • Markup
  • Span

Come up with a good name for the "payload" of a "slice":

  • Payload
  • Mark
  • Marker
  • Value
  • Decorator
  • Decoration
  • Annotation
  • Tag

A deletion of a slice could be supported. Deletion is marking the slice with a tombstone. Rather than creating a separate operation for it, the tombstone could be contained in the mark/payload. If payload is set to undefined, that means the slice has been deleted? More complications arise from the type of the mark:

  • If the mark is a LWW-Register, then we can always set its value to undefined.
  • If the mark is a Const node, then its value can never be changed, hence it can never be deleted.

In general, likely there is no necessity to support "deletion", i.e. ability to add tombstones on the RGA level. Instead, maybe the rich-text layer (Peritext) can mark ranges/slices with tombstones, if needed.

Some text annotations are very small, so it might not be beneficial to have the ability to delete them. For example, making text bold could be achieved by a slice, which has a single byte payload (say, some integer, which represents bold formatting)—in that case, deleting the slice will not be much more efficient that adding another overlay slice which reverses the bold formatting. However, it probably still would be useful to make the RGA level aware of the deleted slices, so, once a slice is deleted, it does not reveal it anymore to the higher level.

Represent the slices as a Grow-only-Set of the following tuples:

Text annotations set:
slice1 = (id1, start1, end1, value1)
slice2 = (id2, start2, end2, value2)
...

Where each start and end element is composed of the range boundary ID as well as a flag, which specifies if the if the edge is inclusive or exclusive:

start = (id, isInclusive)

streamich avatar Nov 28 '22 10:11 streamich

We also need to be able to represent the virtual text start and end elements.

streamich avatar Nov 28 '22 10:11 streamich

Taxonomy:

Edge = (CharacterId, IsInclusive)
Interval = (StartEdge, EndEdge)
Slice = (SliceId, Interval, Value)

Marker (e.g. a splitBlock boundary) is when interval is empty (collapsed), e.g. StartEdge = EndEdge:

Marker = (SliceId, Interval = StartEdge, Value)

The contents of the element of the marker can be anything.

streamich avatar Nov 28 '22 10:11 streamich

Interval consists of two "boundaries":

  • Boundaries
  • Edges
  • Limits
  • Endpoints (used in math)
  • Endelements

Boundaries are linked to an:

  • Anchor
  • Handle

One way to represent an edge is by a 2-tuple [element: ID, isInclusive: boolean], however then isInclusive does not describe well whether it is the "before" or "after" anchor, as defined in Peritext. Basically a boolean is not sufficient to describe the anchor point, if the boundary is used standalone (without the context of the interval).

image

Hence, it is better to call "boundaries" as "endpoints", and define them as:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];

streamich avatar Nov 28 '22 10:11 streamich

Taxonomy 2.0:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, value: ID];
type SliceSet = Slice[];

How to represent the "start" and "end of the RGA sequence?

const startEndpoint = [strNodeId, Anchor.After];
const endEndpoint = [strNodeId, Anchor.Before];

alternatively

const startEndpoint = [undefinedId, Anchor.After];
const endEndpoint = [undefinedId, Anchor.Before];

streamich avatar Nov 28 '22 11:11 streamich

Efficient rich text representation:

type LocalSlice = [id: ID, start: number, length: number, value: unknown];

interface Peritext {
  text: Rope;
  slices: LocalSlice[];
}

interface Rope {
  str(start: number, length: number): string;
  interval(start: number, length: number): Rope;
}

streamich avatar Nov 28 '22 11:11 streamich

Taxonomy 3.0:

  • Optional tag member.
enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, tag?: ID];
type SliceSet = Slice[];

streamich avatar Nov 28 '22 13:11 streamich

Taxonomy 4.0:

Local operation:

enum Anchor { Before, After }
type Endpoint = [index: number, anchor?: Anchor]; // -1 represents the string root element.
type Interval = [start: Endpoint, end?: Endpoint];
type Slice = [interval: Interval, tag?: ID];
type SliceSet = Slice[];

Remote operation:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, tag?: ID];
type SliceSet = Slice[];

streamich avatar Dec 02 '22 09:12 streamich