sled icon indicating copy to clipboard operation
sled copied to clipboard

Sled Best Practices

Open D1plo1d opened this issue 4 years ago • 3 comments

As a NoSQL novice it is not always easy to tell how I should structure my first sled database. What are some best practices the community has to share for a Sled newbie like myself?

I realize that's a bit broad so I have taken the time to list some specific topics I've been wondering about:

  • Do you use strings as keys? numbers? structs or other types? What are the advantages to your approach?
  • Do you use one tree per document type/RDBMS "table" (eg. a "users" tree) or do you use key prefixes to differentiate many types in a single tree? Which approach is best?
  • What struct serialization formats are well suited to Sled? The docs use bincodes but those don't appear to provide the backwards/forwards compatibility guarantees that I'd expect to want in a database that might span multiple iterations of my software (so multiple rust versions/bincode versions/maybe even CPU architectures). What has the community adopted here?

Edit: One more question I promise!

  • How do you handle database migrations? Does anyone do zero downtime deploys? I've thought about serializing my values as an enum containing one version of my struct and then recursively migrating the structs on read to get their latest representation. Is anyone experimenting in this direction?

D1plo1d avatar May 28 '20 13:05 D1plo1d

Just for you information, as Conduit uses sled: Copied from https://git.koesters.xyz/timo/conduit/wiki/Database

Trees

A single sled database can contain many trees, each of which can store key-value pairs.

In Conduit we have a high level overview over all database parts and their trees in src/database.rs. For example the users part of the database contains the userid_password and todeviceid_events trees. The naming scheme for the trees is + underscore + . The types are explained in more detail in their respective module.

For example the key type for todeviceid_events is todeviceid which is explained in src/database/users.rs as UserId + DeviceId + Count. This means that the key is built by taking the user id bytes, followed by a 0xff byte as a separator, followed by the device id bytes and so on. Useful methods on trees Tree::scan_prefix

Create an iterator over tuples of keys and values, where the all the keys starts with the given prefix.

This is useful for trees with keys that are split into multiple parts, like in our example above todeviceid_events. If we want to get all todevice events for a user, we run scan_prefix with the user id + 0xff (delimiter) as the argument. If we want to limit these events to a particular device, you append the device id + 0xff to that key. Tree::range

Create a double-ended iterator over tuples of keys and values, where the keys fall within the specified range.

This might seem like a rare usecase at first, but when you realize that open ranges are supported and it returns an iterator (very efficient), you can use it for many things like getting the last 10 events in a room or all EDUs before a timestamp (when the tree’s key starts with the timestamp).

This is also the reason why Conduit has a global counter which is always increasing for each new event (also referred to as since, count or batch token).

valkum avatar Oct 01 '20 12:10 valkum

How does conduit treat this Database wrapper struct (described in database.rs)? Does each new struct/procress that wants to access some part of the database get a cloned coppy of the Database struct? Or is there some global storage (eg a lazy_static + mutex) which each process locks and shares?

leahcornelius avatar Apr 16 '22 15:04 leahcornelius

@timokoesters I noticed you no longer use sled. I wonder why?

gklijs avatar Jun 07 '23 06:06 gklijs