loro icon indicating copy to clipboard operation
loro copied to clipboard

Feature request: redact(contentIds) and/or sanitize(minimumFrontier) API

Open jcmoore opened this issue 1 year ago • 0 comments

Following up on a discord discussion, with consideration for the ongoing/upcoming work on garbage collection, I'd like to formally request an API to permanently redact content from document history. This would be useful if an editor needs to delete sensitive data from a document (passwords, access tokens, compliance concerning personal identifiable information, etc).

I think an ideal API would support the following UX (in a text document):

  1. app displays a version picker to the use
  2. user picks a version in which the content to be redacted is visible (i.e. not "soft deleted")
  3. user highlights the content to be redacted
  4. use clicks a redact button (and acknowledges warnings about "irrevocable permanent deletion")
  5. app makes it so the selected content is never again visible regardless of the version of the document presented (and hopefully generates sync ops that will cause the same effect on any other doc that merges them in -- which may require mutating the otherwise immutable log if I understand correctly)

I think, if possible, the above has the advantage of preserving positional history and remaining independent from other garbage collection features/plans (but has the disadvantage of maybe needing slightly different implementations for each of the text/map/tree/etc CRDTs).

I suspect if there were a redact op that takes the same content ids as a delete op currently relies on, the code would be very similar between the two -- just the extra step of actually replacing the content for each id with "nothing" (assuming byte lengths need to be preserved).

An alternative approach, one that likely has implications for how the planned garbage collector would work, could be an API on Loro documents that guarantee the contents of a document deleted prior to some minimum frontier have been permanently removed -- something like doc.sanitize(minimumFrontier).

Using a sanitization facility like the above, there are surely a number of approaches to erasing sensitive data (albeit at the cost of positional history and stale version mergability). Here is pseudocode describing one potential approach in which a server ensures that clients who wish to publish changes have first sanitized their documents to some minimum threshold.

jcmoore avatar Jun 14 '24 18:06 jcmoore