hash-db Document DB and joins

Document DB and joins

Open samsquire opened this issue 2 years ago • 9 comments

I would like to support document style storage.

I think document based storage can be implemented on top of a keyvalue storage or relational storage and each document is a view of a record.

This might be slower than a pure document database.

The use case I want to support is joins between documents. Similar to Marklogic xquery.

A primary key corresponds to a set of fields in a structure. This structure needs to be cheap to create. Mongodb stores the entire structure as one blob, as BSON.

I think we can store the structure separately indexed.

{"_id": "1",
"name": "Samuel Squire",
"hobbies": [
{"name": "God"}, {"name": "databases"}, {"name":"multicomputer systems"}
]
}

If we store a blob and the data again as indexed keyvalues, we get denormalisation and potential for joins. For example, if I store the keys 1.name = "Samuel Squire", 1.hobbies.0.name = "God", 1.hobbies.1.name = "databases", 1.hobbies.2.name = "multicomputer systems"

Need an efficient method of storing a tree data structure that allows re creation of the tree data model from any point up to a root.

If I select * from people where hobbies.name = "God" I need efficient access to a people's hobbies structure.

So a rocket converged index would be useful for this use case. But we still have the problem or recreation of the data structure from hobbies upward to the entire of the record 1.

We could do a prefix search of "1" and that would give us the entire people structure. But we would need to deserialize the data into a tree which is slow.

Could enforce an ordering of each inner object and list and number them explicitly. Then generation of JSON can be extremely simple with very little state.

It would be forbidden for keys to overlap they must remain sorted.

For example we could store.

1.0 = "Samuel Squire" 1.1.0.0 = "God" 1.1.1.0 = "databases" 1.1.2.0 = "multicomputer systems"

1.0 is string 1.1 is list 1 is object. 1.1.0 is object 1.1.1 is object 1.1.2 is object

1.0 is name 1.1 is hobbies 1.1.x.0 is name

structure = Key.split(".")
depth = 0
previous_size = 0
for component in structure:
   primary_key = component[0]
   
   fields = component[1:]
   current_size = len(fields)
   for field_index, field in enumerate(fields):
     field_type = field_types[field]
     if previous_size != current_size:
       if previous_type == "object":
         print("}")
         
       if field_type == "object":
         print("}")
     if depth <= field_index and field_type == "list":
       print("[")
       depth = depth + 1
     if field_type == "string":
        print("\"{}\"".format(db.lookup(structure)))
     previous_type = field_type
     previous_size = current_size

May 08 '22 21:05 samsquire

hash-db hash-db copied to clipboard

Document DB and joins

hash-db
hash-db copied to clipboard