HaloDB
HaloDB copied to clipboard
Why not supported range scans?
Range scans
is not supported for HaloDB. If I want to support range scans function
, what should I do?
You need to implement some sort of key ordering
@ahasani Is there a more detailed document about HaloDB designing?
@ahasani Is there a more detailed document about HaloDB designing?
its on the readme doc mate, direct link is here.
IMHO in your particular interest is this paragraph excerpt : "LSM tree and B-Tree also maintain an ordering of keys to support efficient range scans, but the cost they pay is a read amplification greater than 1, and for LSM tree, very high write amplification. Since our workload only does point lookups, we don’t want to pay the cost associated with storing data in a format suitable for range scans."
Cheers
@shuaijunlan range scans are currently not supported. The workload for which HaloDB was designed for only does point lookups.
I haven't given this problem much thought yet, but a possible approach could be to use an ordered index in memory. We should ideally not order the data on disk as this would increase read and write amplification for HaloDB, or might force us to do random writes.
HaloDB currently has an in-memory index, which is an off-heap concurrent hash table.
To support range scans we could use a ConcurrentSkipList
as the index, but since HaloDB need to handle large data sets we need an implementation of the skip list which does memory allocation outside the heap.
More research is needed to figure out how this will perform and the resources it will consume.
This is probably a big effort, and I don't plan do this immediately.
@amannaly Thank you for your guidance. Is there a more detailed document about HaloDB designing?