json-joy
json-joy copied to clipboard
Peritext for JSON CRDT
- [ ] Implement
peritext
orperistring
text type, see Peritext post and paper.
Block element implementation is described here.
Peritext proposes multiple new operations:
- Inline
-
addMark
-
removeMark
-
- Block
-
splitBlock
-
joinBlock
-
updateBlock
-
Consider if less operations can be used.
- As, inline elements are not actual RGA elements, maybe a new concept needs to be introduced for them.
- All block operations can likely be represented by a single invisible "marker" element which links to another node in the document.
Possibly both scenarios can be expressed by a single "slice" concept (analogous to Peritext markers, but which link to another node in the document). If the slice starts and ends at the same spot, it degrades to a "marker"? A block splitBlock
could be represented by an invisible or deleted character to which a slice is attached (start end end of the slice map to the same character).
Come up with a good name for the "slices":
- Interval
- Segment
- Range
- Slice (the whole concept could be called "slice", but the payload could be called "mark")
- Mark
- Annotation
- Markup
- Span
Come up with a good name for the "payload" of a "slice":
- Payload
- Mark
- Marker
- Value
- Decorator
- Decoration
- Annotation
- Tag
A deletion of a slice could be supported. Deletion is marking the slice with a tombstone. Rather than creating a separate operation for it, the tombstone could be contained in the mark/payload. If payload is set to undefined
, that means the slice has been deleted? More complications arise from the type of the mark:
- If the mark is a LWW-Register, then we can always set its value to
undefined
. - If the mark is a Const node, then its value can never be changed, hence it can never be deleted.
In general, likely there is no necessity to support "deletion", i.e. ability to add tombstones on the RGA level. Instead, maybe the rich-text layer (Peritext) can mark ranges/slices with tombstones, if needed.
Some text annotations are very small, so it might not be beneficial to have the ability to delete them. For example, making text bold could be achieved by a slice, which has a single byte payload (say, some integer, which represents bold formatting)—in that case, deleting the slice will not be much more efficient that adding another overlay slice which reverses the bold formatting. However, it probably still would be useful to make the RGA level aware of the deleted slices, so, once a slice is deleted, it does not reveal it anymore to the higher level.
Represent the slices as a Grow-only-Set of the following tuples:
Text annotations set:
slice1 = (id1, start1, end1, value1)
slice2 = (id2, start2, end2, value2)
...
Where each start
and end
element is composed of the range boundary ID as well as a flag, which specifies if the if the edge is inclusive or exclusive:
start = (id, isInclusive)
We also need to be able to represent the virtual text start and end elements.
Taxonomy:
Edge = (CharacterId, IsInclusive)
Interval = (StartEdge, EndEdge)
Slice = (SliceId, Interval, Value)
Marker (e.g. a splitBlock
boundary) is when interval is empty (collapsed), e.g. StartEdge = EndEdge
:
Marker = (SliceId, Interval = StartEdge, Value)
The contents of the element of the marker can be anything.
Interval consists of two "boundaries":
- Boundaries
- Edges
- Limits
- Endpoints (used in math)
- Endelements
Boundaries are linked to an:
- Anchor
- Handle
One way to represent an edge is by a 2-tuple [element: ID, isInclusive: boolean]
, however then isInclusive
does not describe well whether it is the "before"
or "after"
anchor, as defined in Peritext. Basically a boolean is not sufficient to describe the anchor point, if the boundary is used standalone (without the context of the interval).

Hence, it is better to call "boundaries" as "endpoints", and define them as:
enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
Taxonomy 2.0:
enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, value: ID];
type SliceSet = Slice[];
How to represent the "start" and "end of the RGA sequence?
const startEndpoint = [strNodeId, Anchor.After];
const endEndpoint = [strNodeId, Anchor.Before];
alternatively
const startEndpoint = [undefinedId, Anchor.After];
const endEndpoint = [undefinedId, Anchor.Before];
Efficient rich text representation:
type LocalSlice = [id: ID, start: number, length: number, value: unknown];
interface Peritext {
text: Rope;
slices: LocalSlice[];
}
interface Rope {
str(start: number, length: number): string;
interval(start: number, length: number): Rope;
}
Taxonomy 3.0:
- Optional
tag
member.
enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, tag?: ID];
type SliceSet = Slice[];
Taxonomy 4.0:
Local operation:
enum Anchor { Before, After }
type Endpoint = [index: number, anchor?: Anchor]; // -1 represents the string root element.
type Interval = [start: Endpoint, end?: Endpoint];
type Slice = [interval: Interval, tag?: ID];
type SliceSet = Slice[];
Remote operation:
enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, tag?: ID];
type SliceSet = Slice[];