dwn-sdk-js icon indicating copy to clipboard operation
dwn-sdk-js copied to clipboard

Add a '$singleton' option for graph points to keep only the latest single record

Open csuwildcat opened this issue 1 year ago • 7 comments

Many times a graph point is intended to only have one record stored, never multiple. If we added a '$singleton' boolean option it would allow the dev to specify that an object in the graph was to only have one record, the latest one, kept and all others discarded. Thing of the case of having a blog protocol where you have an index html record where you only ever want one, don't really care about the recordId, and just want the latest kept. The $singleton option would make that possible.

csuwildcat avatar Aug 09 '23 12:08 csuwildcat

  1. Can $singleton apply to records other than the root record?
  • If yes, how does that work? Say foo/bar is a $singleton. Can there only ever be one record at path foo/bar across contexts? If I try to write a new foo/bar, does that get rejected or delete the existing foo/bar, which may be in a different context.
  • If no, we need to validate upon protocol ingestion. It seems like the point of $singleton is to restrict a protocol to only one context.
  1. I want to unpack this phrase just want the latest kept. If I have $singleton for protocol path foo, and a tree of descendant records below it, will writing a new record to path foo delete the entire existing tree? That's pretty drastic and dangerous behavior.

diehuxx avatar Aug 11 '23 18:08 diehuxx

I've been working on how this would work and have some WIP PRs around it.

As discussed with @csuwildcat and I believe suggested by @diehuxx, a "$keep" property that is a positive integer greater than 0 rather than a "$singleton" property to only denote 1 would bring better flexibly.

Some thoughts on @diehuxx's questions above:

Yes, I think"$keep" records should be able to be nested.Currently when a new record is written the older ones are purged.

However the reject path is also interesting, and I could see how that might be useful. Maybe behind a protocol definition property that denotes the behavior of $keep, with default being purge. Maybe something like this:

"$keep" : {
    "limit" : 1,
    "strategy": "purge" | "reject"
}

For anything that is a child of a protocol context, the parentId and contextId are required upon creation of that record, currently I "$keep" the limit number of records within that context.

So you could have "foo/bar" with a a "$keep" limit of 5, you would then keep 5 "foo/bar"s for each parent instance of "foo".

If you had 5 "foo" records, you would then have a total of 25 "foo/bar" records. If you query only on protocolPath 'foo/bar' without a contextId you will get all 25 records.

Would like to get some input on this.

LiranCohen avatar Aug 15 '23 16:08 LiranCohen

@LiranCohen Looks good!

We discussed at office hours. I'll summarize:

  • I like the name $keep and the structure you proposed.
  • We should leave out strategy for now. reject isn't worth implementing now (ever?). The inevitable DevEx for reject is bad because.
  • My remaining concern is about how to implement in a way that accommodates sync. In particular how to make purging a record tree performant. When a record is purged, all of its descendants in the protocol are also purged. When purged, the record and its descendants must be deleted from the event log. How do efficiently we get the message CIDs of all descendants?

diehuxx avatar Aug 15 '23 18:08 diehuxx

Or maybe just $limit as it was originally brought up?

thehenrytsai avatar Aug 15 '23 21:08 thehenrytsai

@thehenrytsai I think they like keep because it implies purging/retention.

csuwildcat avatar Aug 15 '23 22:08 csuwildcat

@csuwildcat, I see, fair, so: $keep: 1 seems reasonable.

thehenrytsai avatar Aug 15 '23 22:08 thehenrytsai