im-rs icon indicating copy to clipboard operation
im-rs copied to clipboard

Speeding up split/split_lookup

Open hengchu opened this issue 5 years ago • 4 comments

Thank you for this wonderful package :)

In my application, I'm using OrdMap to keep track of data structures that has many small updates. OrdMap helps me save memory, and speed up cloning.

However, I recently noticed that the split_lookup function could use a better implementation, as the TODO here suggested:

https://github.com/bodil/im-rs/blob/6942c68682bed13e19a5c2141807dfbfd5411abd/src/ord/map.rs#L1381-L1401

According to my benchmark, this current split implementation is actually slower than simply doing a m.into_iter().filter(|(k, v)| k < threshold).collect::<OrdMap<_, _>>(). But the filter based implementation isn't ideal either.

In principle, we should be able to take advantage of the ordering, and discard a whole range of values all at once. I guess the tricky part is this may break some shape-related invariants (like balancedness) of OrdMap, and we need to restore them efficiently.

I'd love to help with this. Could you share some pointers on where to get started? Thanks!

hengchu avatar Apr 17 '20 18:04 hengchu

I wish I had something useful to share, but the body on B-trees seems remarkably uninterested in anything but the base operations, from what I've been able to dig up by searching. Maybe the BTreeMap implementation can offer some insights? I haven't really studied it and don't know if they do anything smart here, but I suspect if an algorithm exists at all, they're using it.

bodil avatar Apr 22 '20 13:04 bodil

I haven't dug into this part of the crate, but I know what needs done to speed up this operation in the BTree. I can implement this if you want @hengchu?

nomad010 avatar May 02 '20 23:05 nomad010

Oh, that reminds me, I've been through the literature since I wrote the above, as I'm working on a different B-tree implementation, and Graefe's Modern B-Tree Techniques is an excellent survey paper to start with.

bodil avatar May 03 '20 10:05 bodil

Just got bitten by this after spending a fair bit of time profiling -- understand a fix is hard, though maybe updating the documentation to mention the slowness would help, since the top level module docs suggest most things are O(log n), so I'd assumed this was too since it's not mentioned.

Echoing Hengchu, thanks for the great lib, love it.

grantslatton avatar Oct 28 '22 23:10 grantslatton