itertools icon indicating copy to clipboard operation
itertools copied to clipboard

Support for custom hasher for std groupping

Open a14e opened this issue 2 years ago • 3 comments

Overview

I want to suggest an improvement for the library -- the ability to use custom hashes (specifically interested in ahash).

Why?

The standard library uses a sip hash, which is protected against various types of attacks on hash tables, but it is known to be quite slow. Alternative hashes can be 10+ times faster (for example, there is a benchmark example in ahash)

What difficulties might arise?

It's unclear how to implement this

  1. Option 1. Keep the current implementation, but add a third argument with a default value, for example, the methods would look like this:
    #[cfg(feature = "use_std")]
        fn into_group_map<K, V, R = RandomState>(self) -> HashMap<K, Vec<V>, R>
        where
            Self: Iterator<Item = (K, V)> + Sized,
            K: Hash + Eq,
            R: BuildHasher + Default
        {
            group_map::into_group_map(self)
        }
    
    but the current Rust doesn’t know how to do this and writes something like
    error: defaults for type parameters are only allowed in `struct`, `enum`, `type`, or `trait` definitions
        --> src/lib.rs:3076:29
         |
    3076 |     fn into_group_map<K, V, R = RandomState>(self) -> HashMap<K, Vec<V>, R>
         |                             ^^^^^^^^^^^^^^^
         |
         = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
         = note: for more information, see issue #36887 <https://github.com/rust-lang/rust/issues/36887>
         = note: `#[deny(invalid_type_param_default)]` on by default
    
  2. Option 2, we can try replacing all imports with a flag from std::collections::HashMap to ahash::HashMap, but this breaks backward compatibility

Maybe there are some other options?

a14e avatar Oct 16 '23 20:10 a14e

This issue seems to be a duplicate of #322.

a14e avatar Oct 16 '23 20:10 a14e

Other possible options:

  1. Generalize the current implementation and create duplicates of methods with custom hashers (for example, HashMap has a constructor new and with_hasher).
  2. Create some kind of middleware, and call, for instance, vec![1, 2, 3].iter().grouping::<ahash::HashMap>, and implement all grouping options on top of this middleware. What do you think?

a14e avatar Oct 16 '23 21:10 a14e

As a maintainer or not, I certainly would like to customize the hasher too. Related to #462 which contains comments written by phimuemue that I agree with and that could interest you.

Maybe we should create a label for this kind of issues?

Philippe-Cholet avatar Oct 20 '23 12:10 Philippe-Cholet