rangemap
rangemap copied to clipboard
some possible inefficiences
rangemap
This is one my favorate data structures. Thank you for implementing it.
I believe ranges are under utilized in practise in general.
I implemented the "same" thing in C++ years ago.
My implementation has some efficiency gaps and I think yours does too.
I'm not good at Rust, but when I went to reimplement my C++ open and perfected, I ran into, something I think you "messed up" too.
Specifically I think there need not be any linear searches.
Including here:
https://github.com/jeffparsons/rangemap/blob/master/src/map.rs#L155
while let Some((stored_range_start_wrapper, stored_value)) = self
while I have not finished my code, I believe you can limit yourself to two binary searches -- for the start and the end.
Also here, I think your implementation sometimes does an insert where it can reuse an existing range.
I realize this structure has surprisingly many cases to handle and making it perfect is not trivial.
e.g. you should avoid this: https://github.com/jeffparsons/rangemap/blob/dc7f07d154acd6f3ead119e92fd496fbe69f4802/src/map.rs#L323
// Delete the stored range, and then add back between
// 0 and 2 subranges at the ends of the range to insert.
self.btm.remove(&stored_range_start_wrapper);
Hi Jay, thanks for getting in touch! :wave:
https://github.com/jeffparsons/rangemap/blob/master/src/map.rs#L155
Unfortunately the Rust BTreeMap API is quite limited at the moment, so I think removing all entries between the start and end of the range we're interested isn't possible. I'm sure it's possible to optimise the current implementation a bit, but rather than doing that and making it significantly more complex now, I'd rather wait until features like https://github.com/rust-lang/rust/pull/81075 are stabilized and I can optimise it and make it simpler at the same time. :blush:
The Holy Grail in my mind would be a proper BTreeMap cursor API that would let you navigate the tree in whatever direction(s) you want, and delete/insert at whatever point you're at. A handy extension would be methods that consume two non-mutating cursors to do things like removing that whole range from the map, which would let you efficiently find your start and end points however you want and then remove all elements in the range without having to re-find those elements you already found.
Also here, I think your implementation sometimes does an insert where it can reuse an existing range.
I absolutely agree, and this is one of the things I'm least happy with. But as above, I'm hesitant to optimise it right now and make it much more complex in the process given that I know there are some better upstream APIs on the horizon that should give me the best of both worlds.
In short, yepp, I completely agree with you, but I don't really have the time right now to implement and maintain those optimizations.
Yeah, C++ maybe has advantages here, it already has the iterators (cursor) and the range delete, though there also inconsistencies between global upper_bound / lower_bound and set/map::upper_bound/lower_bound -- set/map force you to always start at the root, but ideal here is, I think, you do upper or lower bound of the range start, and then from there for the end.
And to further agree, I put more time into this in C++ and it is, tricky. There are so many cases, overlap, adjacent, on each side (of each side), and the data matching or not. A few cases can be combined.
Also to be clear, in terms of insert/delete with cursor, it should be a hint probably...depending on if the underlying storage is a vector whose order you maintain, or a tree that maintains its own ordering. I generally prefer the tree/set/map underlying except for that nagging problem in STL where upper_bound/lower_bound in set/map cannot take a starting point.
While working on #40 I noticed that there are a few places where a key is cloned. This seems like it could be inefficient depending on what's being used as the key, but I'm not sure what could be used as an alternative.