hashbrown icon indicating copy to clipboard operation
hashbrown copied to clipboard

HashTable iterator over entries

Open TylerBloom opened this issue 1 year ago • 4 comments

Hello, I have a crate that creates a key-to-key map (i.e. a "double-sided" map). This is implemented with two maps that index into each other. I want to implement an iterator similar to hash_table::ExtractIf, but I must provide an reference to both the left and right elements to the predicate and have that iterator yield both elements.

The problem is that there is no direct way to solve this with the existing APIs to the HashTable and still make the borrow checker happy. The best solution that I've come up with the existing APIs is to use HashTable::extract_if on one table to "smuggle out" elements from the other side and then pull the smuggled item out with Iterator::map. This works but is far from ideal:

let push = Rc::new(RefCell::new(None));
let pull = push_cache.clone();
let right_set = &mut self.right_set;
self.left_set
    .extract_if(move |left| {
        let Ok(right_entry) =
            right_set.find_entry(left.hash, hash_and_id(left.hash, left.id))
        else {
            // NOTE: This shouldn't happen.
            return false;
        };
        if f(&left.value, &right_entry.get().value) {
            push.borrow_mut().insert(right_entry.remove().0);
            true
        } else {
            false
        }
    })
    .map(move |l| {
        let Some(r) = pull.borrow_mut().take() else {
            panic!()
        };
        (l.value, r.value)
    })

However, if there were an iterator over the hash table that yielded OccupiedEntrys rather than just references, this iterator would be trivial to construct. I could zip the two iterators together, pass the references to the predicate, and, if true, call OccupiedEntry::remove on both entries.

Would such an iterator be possible to implement for HashTable?

TylerBloom avatar Dec 11 '24 18:12 TylerBloom

So, just to be clear, you want an iterator that yields OccupiedEntry for occupied entries and… something else, for vacant entries?

Note that we can't return VacantEntry because that requires storing a key, and we won't know what the key is for empty entries. Perhaps Option<OccupiedEntry> or something similar would work.

In this case, does it even need to be OccupiedEntry? Would it be possible to just return Option<(&K, &V)> for occupied entries?

clarfonthey avatar Dec 11 '24 21:12 clarfonthey

I think the solution for this is the same as #588: some form of cursor API which allows you to advance through the table while choosing at each step whether to remove the element.

The API could look something like this (very rough draft):

struct ExtractCursor<'a>;

impl ExtractCursor<'_> {
    fn get(&self) -> &T;
	fn get_mut(&mut self) -> &mut T;
    fn remove(&mut self) -> T;
    fn next(&mut self) -> bool;
}

Amanieu avatar Dec 11 '24 23:12 Amanieu

So, just to be clear, you want an iterator that yields OccupiedEntry for occupied entries and… something else, for vacant entries?

Sorry, I should have been more clear. I only want to iterate over the occupied entries so that I can (optionally) remove values from them. I do not have a need to iterate over the vacant entries (nor do I know how one would do that). To put this into context, the code sample I provided could be rewritten (loosely) like this with an iterator over entries:

// NOTE: Each table contains wrapper structs that contain data to point 
self.left_set.iter_entries()
             // Get the corresponding entry from the right table
             .map(|l| (l, self.right_set.find_entry(l.hash, hash_and_id(l.hash, l.id)).unwrap())))
             // Pass it through the predicate (similar to `extract_if` or `retain`)
             .filter(|(l, r)| predicate(l.get(), r.get()))
             // Call remove on the entries because they passed through and yield both items
             .map(|(l, r)| (l.remove(), r.remove()))

The cursor API could also work. The solution would look a lot different, but that's fine by me.

TylerBloom avatar Dec 11 '24 23:12 TylerBloom

Note that the current OccupiedEntry holds a &mut HashTable, which means we cannot have multiple entries in existence at the same time -- the Iterator API is too flexible for what you're trying to do. A cursor avoids that problem by only ever having one instance for the table, and you can extract items without your Item counting as a table borrow.

cuviper avatar Dec 11 '24 23:12 cuviper