collection icon indicating copy to clipboard operation
collection copied to clipboard

Avoid custom code to combine hash codes, instead use `Object.hash` / `Object.hashAll`

Open mkustermann opened this issue 2 years ago • 2 comments

Currently the DeepCollectionEquality code has it's own implementation of combining hashcodes, see lib/src/equality.dart:

class _MapEntry {
  ...
  int get hashCode =>
      (3 * equality._keyEquality.hash(key) +
          7 * equality._valueEquality.hash(value)) &
      _hashMask;
}

class MapEquality<K, V> implements Equality<Map<K, V>> {
  ...
  int hash(Map<K, V>? map) {
    if (map == null) return null.hashCode;
    var hash = 0;
    for (var key in map.keys) {
      var keyHash = _keyEquality.hash(key);
      var valueHash = _valueEquality.hash(map[key] as V);
      hash = (hash + 3 * keyHash + 7 * valueHash) & _hashMask;
    }
    hash = (hash + (hash << 3)) & _hashMask;
    hash ^= (hash >> 11);
    hash = (hash + (hash << 15)) & _hashMask;
    return hash;
  }
}

We may want to consider avoiding that, and instead use Object.hash / Object.hashAll in the core libraries, which implementation can optimize according to various modes (web or non-web, JIT/AOT, 32-bit/64-bit, compressed pointers or not)

/cc @lrhn

mkustermann avatar Jan 04 '23 10:01 mkustermann

I think this code is trying to avoid doing two iterations and any intermediate allocations. It still does a lookup for each key, which is also bad, but possibly only constant-factor bad (assuming the map has constant-time lookup).

It's is valid performance concern to avoid linear allocations or double iteration, but that means we can't directly use Object.hash/Object.hashAll.

We could do Object.hashAll(map.entries.expand((e) => [_keyEquality.hash(e.key), _valueEquality.hash(e.value)]), but that allocates a MapEntry and a two-element list for each entry of the map.

Or Object.hashAll(map.keys.map(_keyEquality.hash).concat(map.values.map(_valueEquality.hash))), which is probably only a constant amount of allocation (five iterables, two closures), but does two iterations.

If we could expose an interface to hashing, like:

class Hasher<H> {
  static const Hasher<Object?> defaultHasher = _SystemHasher();

  H initialState();

  // May modify state and return the same object, or return a new state.
  // Always use the new value for all further operations.
  H update(H state, int value);

  int finalize(H state); 
}

based on the same primitives as the platform hash, then we could write this as:

  int hash(Map<K, V>? map) {
    const hasher = Hasher.defaultHasher;
    var state = hasher.initialState();
    map.forEach((k, v) {
      state = hasher.update(hasher.update(state, _keyEquality.hash(k)), _valueEquality.hash(v));
    });
    return hasher.finalize(state);
  }

In other words, the current platform hash operations are not great for someone wanting to write their own hash for a collection.

lrhn avatar Jan 04 '23 10:01 lrhn

If we could expose an interface to hashing, like ...

:+1: I've also mentioned this use case in bottom of https://github.com/dart-lang/sdk/issues/50693. Maybe we should consider adding this to corelibs?

mkustermann avatar Jan 04 '23 11:01 mkustermann