datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable`

Open crepererum opened this issue 1 year ago • 9 comments

What

Migrate from hashbrown::raw::RawTable to hashbrown::hash_table::HashTable.

Why

RawTable and raw_entry are removed in hashbrown 0.15.1. See https://github.com/apache/datafusion/pull/13256 . Personally I think it's the right thing to do because RawTable was a bit of a weird API.

How

First, we need to get some memory accounting for HashTable. This can be done in a similar way to

https://github.com/apache/datafusion/blob/e25f5e7485ffcd810f96c7be096b04b3cacf30b3/datafusion/common/src/utils/proxy.rs#L110-L111

/// Extension trait for hash browns [`HashTable`] to account for allocations.
pub trait HashTableAllocExt {
    /// Item type.
    type T;

    /// Insert new element into table and increase
    /// `accounting` by any newly allocated bytes.
    ///
    /// Returns the bucket where the element was inserted.
    /// Note that allocation counts capacity, not size.
    ///
    /// # Example:
    /// ```
    /// TODO: rewrite this example!
    /// ```
    fn insert_accounted(
        &mut self,
        x: Self::T,
        hasher: impl Fn(&Self::T) -> u64,
        accounting: &mut usize,
    );
}

impl<T> HashTableAllocExt for HashTable<T> where T: Eq {
    type T = T;

    fn insert_accounted(
        &mut self,
        x: Self::T,
        hasher: impl Fn(&Self::T) -> u64,
        accounting: &mut usize,
    ) {
        let hash = hasher(&x);

        // NOTE: `find_entry` does NOT grow!
        match self.find_entry(hash, |y| y == &x) {
            Ok(_occupied) => {}
            Err(_absent) => {
                if self.len() == self.capacity() {
                    // need to request more memory
                    let bump_elements = self.capacity().max(16);
                    let bump_size = bump_elements * size_of::<T>();
                    *accounting = (*accounting).checked_add(bump_size).expect("overflow");

                    self.reserve(bump_elements, &hasher);
                }

                // still need to insert the element since first try failed
                self.entry(hash, |y| y == &x, hasher).insert(x);
            }
        }
    }
}

Then, migrate every RawTable to HashTable. Since RawTable is used all over the place, I think this should be split into multiple PRs. Luckily, HashTable is already available in hashbrown 0.14.x, so we can do this iteratively before upgrading to hashbrown 0.15.x.

Progress

  • [x] #13514
  • [x] https://github.com/apache/datafusion/pull/13524
  • [x] https://github.com/apache/datafusion/pull/13658

crepererum avatar Nov 15 '24 09:11 crepererum

Thanks @crepererum for the steady progress on this

Dandandan avatar Dec 06 '24 19:12 Dandandan

TBH the last two batches are rather hard:

hashbrown 0.14 & allocation size

hashbrown 0.14 doesn't expose the allocation size for HashTable which we would need here:

https://github.com/apache/datafusion/blob/47569b21c50eab42771ca62fd237f429362e8a62/datafusion/physical-plan/src/joins/stream_join_utils.rs#L152-L158

hashbrown 0.15 offers that, but then we have to roll the RawTable->HashTable change together with the upgrade, not as separate steps.

ArrowHashTable

This is a bit of a wild interface and I dunno if it's worth the 15% uplift TBH:

https://github.com/apache/datafusion/blob/47569b21c50eab42771ca62fd237f429362e8a62/datafusion/physical-plan/src/aggregates/topk/hash_table.rs#L63-L86

The issue here is that HashTable doesn't offer an unsafe "directly address the bucket via index" kind of interface because it is fundamentally rather risky. I wonder if we should somewhat redesign this part. As far as I understand, this uses the following concepts:

  1. data heap: a "heap" (which I guess is just a vector) to store the actual payload data
  2. mutable slot: a slot that stores a mutable index to the data heap. I guess this is used to update the data pointer whenever a new/better value was found. The slots are referenced by a simple usize index.
  3. key lookup: A way to lookup of key->mutable slot.

The current solution fuses 2 & 3 into a single RawTable w/ a LOT of unsafe. I think we could deconstruct that into a HashTable (for 3) + Vec (for 2). This can be done BEFORE the hashbrown 0.15 update.

crepererum avatar Dec 09 '24 10:12 crepererum

This is a bit of a wild interface and I dunno if it's worth the 15% uplift TBH:

I think this is something @avantgardnerio and @thinkharderdev may know more about. Perhaps they can offer some suggestions / thoughts about improving the performance or some benchmarks that we can use to see how much performance would be affected if we switched to the newer hashbrown version

alamb avatar Dec 09 '24 20:12 alamb

I think this is something @avantgardnerio and @thinkharderdev may know more about.

The main reason that 15% was so important was to keep it on par with the unoptimized version. If CPU is equal, then the optimizer rule was easy: just always use this rule.

If we are okay taking a performance hit, or if @crepererum 's idea keeps performance the same, then there's no reason to keep the unsafe code.

avantgardnerio avatar Dec 09 '24 20:12 avantgardnerio

TBH the last two batches are rather hard:

hashbrown 0.14 & allocation size

hashbrown 0.14 doesn't expose the allocation size for HashTable which we would need here:

https://github.com/apache/datafusion/blob/47569b21c50eab42771ca62fd237f429362e8a62/datafusion/physical-plan/src/joins/stream_join_utils.rs#L152-L158

hashbrown 0.15 offers that, but then we have to roll the RawTable->HashTable change together with the upgrade, not as separate steps.

ArrowHashTable

This is a bit of a wild interface and I dunno if it's worth the 15% uplift TBH:

https://github.com/apache/datafusion/blob/47569b21c50eab42771ca62fd237f429362e8a62/datafusion/physical-plan/src/aggregates/topk/hash_table.rs#L63-L86

The issue here is that HashTable doesn't offer an unsafe "directly address the bucket via index" kind of interface because it is fundamentally rather risky. I wonder if we should somewhat redesign this part. As far as I understand, this uses the following concepts:

  1. data heap: a "heap" (which I guess is just a vector) to store the actual payload data
  2. mutable slot: a slot that stores a mutable index to the data heap. I guess this is used to update the data pointer whenever a new/better value was found. The slots are referenced by a simple usize index.
  3. key lookup: A way to lookup of key->mutable slot.

The current solution fuses 2 & 3 into a single RawTable w/ a LOT of unsafe. I think we could deconstruct that into a HashTable (for 3) + Vec (for 2). This can be done BEFORE the hashbrown 0.15 update.

The basic idea is just a hash map where the values are sorted so you can use as a combo aggregation by key + topk heap. @avantgardnerio and I went through a couple iterations on the design and landed on this because it doesn't regress performance when the grouping key is low cardinality.

thinkharderdev avatar Dec 10 '24 01:12 thinkharderdev

If we are okay taking a performance hit, or if @crepererum 's idea keeps performance the same, then there's no reason to keep the unsafe code.

My idea was to ideally not regress significantly, at least not without discussing it first.

crepererum avatar Dec 10 '24 10:12 crepererum

Crosslinking original raw feature deprecation from hashbrown https://github.com/rust-lang/hashbrown/pull/546

comphead avatar Jun 03 '25 23:06 comphead

reg to PRs above I think it is agreed to replace RawTable with HashTable across DataFusion?

comphead avatar Jun 03 '25 23:06 comphead

reg to PRs above I think it is agreed to replace RawTable with HashTable across DataFusion?

correct. As far as I can tell, this is the last usage of it:

https://github.com/apache/datafusion/blob/3fe464035d545d467bb5e611cf98f693bf57e9a0/datafusion/physical-plan/src/aggregates/topk/hash_table.rs#L57

crepererum avatar Jun 12 '25 11:06 crepererum