stargate TimeUUID in DocumentAPI

Everyone knows that C * supports timestamped UUIDs, it would be great to use this property in the DocumentAPI. For example, for sub-objects as a parameter. When sorting, request subobjects for specific dates based on their TimeUUID parameters.

{ "DialogID": "6782658762783649273", "StepID": "78326487234", "DialogName": "Error step", "State": "open", "Comments": { "8bfa51a4-caa3-11eb-b8bc-0242ac130003": { //TimeUUID "Body": "It is very sad", "CommentUser": "Anola Gate" }, "8bfa51a4-caa3-11eb-b8bc-06545465465": { "Body": "Come on, pour and drink!", "IDuser": "Pablo Escobar" } } }

Jun 17 '21 14:06 sash2222

@EricBorczuk or @dimas-b can you weigh in on this? Since the path (leaf?) is stored as a text column I'm guessing we would need to do this sorting in stargate rather than in C*? I'm guessing we would need some way to mark that property as a timeUUID rather than a normal UUID since we would still want to support alphanumeric sorting by default.

Jun 17 '21 18:06 dougwettlaufer

In this case the TimeUUID value is part of what Docs API impl calls a "path", so its value will be stored in one of pNNN columns and become a clustering key.

I suppose the most relevant use case would be fetching sub-documents at that path, where responses would paginate over "comment" object. In this case we could leverage C* native TimeUUID order for where conditions and paging. Sorting in memory may not be ideal in this case as I imagine this optimization is meaningful only for large volumes of data.

If the whole document is returned, the order of TimeUUID keys may also be preserved, but that can be done purely within Docs API code if we know the property is a TimeUUID. Perhaps we can leverage the recently added JSON schema for that, but I need to read up on that to be sure.

So, all in all, it looks like this request requires prior knowledge of JSON document schema.

If schema is known at the time we have to create the collection table, I guess we should be able to support specific types for each path element.

After thinking about this, it looks like a non-trivial amount of work to me.

@sash2222 : What is a rough estimate on the number of "comments" in your use cases?

Jun 17 '21 21:06 dimas-b

Hi @dimas-b . The number of comments within the document in this scenario will be quite small, no more than 50. There may indeed appear "branches" in the form of replies to comments. However, in the future, such an approach can be scaled more widely, for example, to the level of an operator-client dialogue, in which the number of comments can tend to 1K or more. Moreover, it would be great to be able to manage cluster keys, not only TimeUUID, but to use some other values.

Jun 18 '21 07:06 sash2222

Re: clustering keys, currently they are derived from JSON object property names, so they are limited to strings in input data. Reinterpreting those strings is certainly possible if document schema is known.

This is certainly very interesting input into how Docs API can evolve / improve. I guess we'll keep this issue open for consideration in the future.

Short-term, given the small data size I hope this issue is not a blocker for you, @sash2222.

Jun 18 '21 14:06 dimas-b

Of course this is not a blocking request :). Thank you for your willingness to help and openness to dialogue!

Jun 18 '21 14:06 sash2222