hashbrown icon indicating copy to clipboard operation
hashbrown copied to clipboard

Conversion of raw pointers into buckets

Open florian1345 opened this issue 1 year ago • 5 comments

Currently, it is possible to convert Buckets into raw pointers through Bucket::as_ptr, but I was unable to find a reverse. Would it be possible to offer a [unsafe] fn Bucket::from_ptr(ptr: *mut T) -> Bucket<T>?

As a use case, I am using raw pointers into a raw table and would like to efficiently store optional pointers as nullable. A function like this would allow me to model it as follows.

struct OptionalBucket<T> {
    ptr: *mut T
}

impl<T> OptionalBucket<T> {
    fn some(bucket: Bucket<T>) -> OptionalBucket<T> {
        OptionalBucket { ptr: bucket.as_ptr() }
    }
    
    fn none() -> OptionalBucket<T> {
        OptionalBucket { ptr: std::ptr::null_mut() }
    }
    
    fn as_option(self) -> Option<Bucket<T>> {
        if self.ptr.is_null() { None } else { Some(Bucket::from_ptr(self.ptr)) }
    }
}

The best I found is the ugly and implementation-dependent std::mem::transmute(ptr.add(1)).

florian1345 avatar Jun 06 '23 15:06 florian1345

This doesn't work when T is a ZST since all buckets will have the same address.

Also Bucket itself already contains a NonNull, so it will efficiently pack itself into an Option.

Amanieu avatar Jun 06 '23 16:06 Amanieu

I did not know about that property of Bucket, thank you for enlightening me!

I attempted an implementation using that, but I ran into another issue. As I am trying to construct a linked list of buckets, I would like a way to have dummy head/tail buckets to enable easy removal, however I found no clean way to do that. Since conversion from raw pointer to Bucket is not viable due to the ZST issue, maybe there is some other possible API to enable that use case?

Two ways I came up with would be to make a free-standing bucket (which I suppose violates the intuition of what a bucket is supposed to be) or allowing accessing RawTable entries with pointers (e.g. RawTable::remove_ptr).

florian1345 avatar Jun 17 '23 09:06 florian1345

I don't think that would work, can't you just use an enum here.


Incidentally I had a look at your lru-mem crate: have you considered doing something similar to IndexMap except that instead of putting the entries in a Vec, putting them in a VecDeque instead? That way you still have O(1) lookups and O(1) removal of the least recently used entry by popping it off one end of the deque.

Amanieu avatar Jun 17 '23 15:06 Amanieu

As I am trying to construct a linked list of buckets, I would like a way to have dummy head/tail buckets to enable easy removal, however I found no clean way to do that.

You are using Bucket to build a linked list, won't Bucket fail when new values are inserted in the RawTable?

yyy33 avatar Mar 13 '24 15:03 yyy33

Sorry for not replying sooner (and thanks for the ping). I have somewhat forgotten about this issue as I have worked on other projects.

You are using Bucket to build a linked list, won't Bucket fail when new values are inserted in the RawTable?

I use try_insert_no_grow, which should retain the addresses of all filled buckets, unless I am missing something fundamental. If reallocation is necessary, I reconstruct the entire linked list using the new pointers.

Incidentally I had a look at your lru-mem crate: have you considered doing something similar to IndexMap except that instead of putting the entries in a Vec, putting them in a VecDeque instead? That way you still have O(1) lookups and O(1) removal of the least recently used entry by popping it off one end of the deque.

This would no longer allow O(1) deletion from arbitrary positions in the list, which I require for the LRU-cache to ensure O(1) getting. A get-operation on an LRU-cache marks the retrieved element as "most recently used" and thus has to remove it from the list and put it at the head. In a VecDeque, this would be O(n).

I don't think that would work, can't you just use an enum here.

That would require a branch during removal. If the removed element has a predecessor, it must be updated to point to the removed element's successor as its own. If the removed element does not have a predecessor, this cannot be done (same for successors). To avoid having to branch here, I use a dummy head/tail element to serve as the predecessor of the first element and successor of the last element. Then, I can ignore this condition and handle each element equally. I learned this trick from the lru crate, which does it very similarly.

After considering the implications for hashbrown as laid out by you, I am not sure if there is a clean way to integrate this use case into the RawTable API. In the end, maybe the simplest way to resolve this issue would be for me to fork the RawTable implementation (with proper attribution of course) and specialize it for my use case?

florian1345 avatar Apr 01 '24 19:04 florian1345