sled icon indicating copy to clipboard operation
sled copied to clipboard

Add something similar to `set_merge_operator`, but for comparison & hashing

Open ckaran opened this issue 4 years ago • 2 comments

Can you add a method similar to set_merge_operator, but for comparisons/hashing?

Use Case:

I want to use sled to store events for my custom written simulator. Events should be stored in temporal order, so I have timestamps for each and every event that needs to be stored, (disambiguated with a UUID so I don't lose events) as keys. The timestamps are ultimately stored as rug::Rational instances, which have a sort order that is different than the lexicographic order for &[u8]. This affects things like first, iter, last, pop_max, pop_min, and range. This means that I can't rely on pop_min returning the next event, nor can I simply scan through a log of events to look for interesting events.

I expect that typed trees will be able to have custom sort orders defined on them, but I'm concerned that under the hood they'll have to find a way of mapping to a lexicographic sort order, which may be expensive. This request is to solve that before the problem comes up.

Proposed Change:

Create a new method similar to set_merge_operator for a comparison & hash operator that knows how to correctly compare/hash the keys in a given tree. The operator will need to implement Eq, PartialEq, Ord, PartialOrd, and Hash and be a pure function (for speed reasons). I don't know if the operator will need to be set before the first item is entered into the tree, or if the tree will be able to rebuild itself to match the new operators; that is something that will need to be decided before any implementation work is done though, as it will affect both correctness and parallel operation.

Who Benefits From The Change(s)?

Anyone whose keys have sort orders different from their lexicographic sort orders, and which can't easily be forced into a lexicographic ordering. In my case, the simulator is temporally fuzzy; I may get the results for a set of events out of temporal order, and I have no way of predicting how close a pair of events are (which means I can't round to the nearest nanosecond and call it a day). Others will likely have keys that have to be put 'in between' other keys that are already in the database, which means mapping to an integer won't work, and mapping to a string might lead to excessively long strings.

Alternative Approaches

  • Do nothing - The easiest thing to do is to ignore this request, and do nothing. As I mentioned above, this could have a negative effect on typed trees.
  • Map keys to other values, and use the other values as keys in the database - This requires a way of generating new values that can be fit between values that are already in the database. The issue is that in the worst case your keys can get very, very long as you try to split the difference between two existing keys.

ckaran avatar Sep 10 '20 13:09 ckaran

RocksDB provides a Comparator API to customize key ordering. There is a note on backwards compatibility in that doc too (not possible).

shikhar avatar Nov 30 '20 02:11 shikhar

That makes sense, but is nothing new. If you look at the code for set_merge_operator, and the definition of merge_operator, it's obvious that you can change the merge operator defined on a tree at any time. This is also a backwards compatibility issue, and one that end users need to be aware of, but not afraid of. In the worst case, you'd just create before and after tags in your repository for the changes, and do what is necessary to export/import the data from the old to the new databases. For really old databases, you'll have to look through your tags, your change log, and any other notes you have to figure out where you changed the schema, and then do what is necessary to get the old code running well enough to pull the data out.

All that said, if you're really worried about bitrot, you probably want to periodically dump your data into several different formats, each of which has their own schema documentation somewhere. That way, even if you can't access your data using your own code, you might be able to reload it through one of the other formats.

ckaran avatar Nov 30 '20 13:11 ckaran