lru-rs icon indicating copy to clipboard operation
lru-rs copied to clipboard

Reduce boxing

Open Ralith opened this issue 4 years ago • 7 comments

The current implementation boxes every key/value pair to obtain a stable address with which to construct an intrusive linked list. While investigating new designs following #70, I realized that this indirection and the associated reduction in locality can, in principle, be avoided.

The payload of a HashMap is already heap-allocated, and only moves when the container's capacity grows and it's rehashed. An intrusive linked list could therefore be soundly maintained by the HashMap implementation itself by including the links in the underlying raw table and rewriting them when copying them over in the course of a resize.

Unfortunately, I can't think of a way to implement this without this crate containing its own full hash table implementation; even hashbrown's RawTable API does not expose a way to customize the rehashing procedure, and I don't see a way to do it as a second pass. Maybe worth reaching out to @Amanieu or someone to see if there's a way hashbrown could enable this so an effective fork, and the resulting maintenance headache, isn't necessary?

Ralith avatar Feb 06 '20 16:02 Ralith

Have you considered using a copy of the key instead of a pointer? Generally keys are pretty small, but you would need K: Clone which may be a breaking change.

Amanieu avatar Feb 06 '20 17:02 Amanieu

Ah nevermind, this would work poorly if the key is something like a String.

Regarding the hash table, I don't think hashbrown is a good fit, but you may be interested in https://github.com/Amanieu/intrusive-rs/pull/37.

Amanieu avatar Feb 06 '20 17:02 Amanieu

I don't think hashbrown is a good fit

It'd be nice to benefit from hashbrown's optimizations. It seems like the only missing piece here is the ability for the caller to control the rehashing operation. In particular, it would be necessary for this lib to remove elements from the old table and insert them into the new one in LRU list order, constructing the correct chain as it goes.

That could be accomplished by something as simple as impl RawTable<T> { fn resize(&mut self, n: usize, rehash: impl Fn(&mut [MaybeUninit<T>], &mut [MaybeUninit<T>])) }. Is anything like that completely out of the question?

Ralith avatar Feb 06 '20 23:02 Ralith

On review, I think this could be accomplished with the existing interface by constructing the LRU list from bucket indices rather than raw pointers. As a bonus, I think this would allow removing nearly all of the unsafe from this crate, including the dicey uninitialized stuff.

Ralith avatar May 22 '20 00:05 Ralith

@jeromefroe, would you be interested in a PR to replace the use of pointers with bucket indices, and thereby remove all the boxing?

Ralith avatar Aug 01 '20 20:08 Ralith

@Ralith definitely, would be more than happy to review it.

jeromefroe avatar Aug 02 '20 14:08 jeromefroe

Hmm, bucket indexes still need to be rewritten on rehash. This would be really easy if hashbrown exposed a hook to rewrite entries on rehash, and otherwise seems infeasible.

Ralith avatar Aug 02 '20 19:08 Ralith