containers icon indicating copy to clipboard operation
containers copied to clipboard

better instance Hashable IntSet?

Open jwaldmann opened this issue 2 years ago • 9 comments

cross reference to https://github.com/haskell-unordered-containers/hashable/issues/269

jwaldmann avatar Jul 17 '23 16:07 jwaldmann

for a given set of elements, the IntSet representation is unique? E.g., "the mask is the largest position ..." https://github.com/haskell/containers/blob/master/containers/src/Data/IntSet/Internal.hs#L261

Then a hash function could just use structure and contents of the internal representation generically?

jwaldmann avatar Jul 19 '23 06:07 jwaldmann

@jwaldmann Correct.

treeowl avatar Jul 19 '23 07:07 treeowl

Correct, but it would be wasteful. The Tips contain all the elements, it would be sufficient to just hash the contents of those (ignoring Bins).

meooow25 avatar Jul 19 '23 13:07 meooow25

Correct, but it would be wasteful. The Tips contain all the elements, it would be sufficient to just hash the contents of those (ignoring Bins).

Oh yeah, I think there is redundant info there. I'd have to review the representation; it's been a little while.

treeowl avatar Jul 19 '23 14:07 treeowl

Ah yes, the Tip includes the prefix.

jwaldmann avatar Jul 19 '23 15:07 jwaldmann

We could offer

serialize :: (Word -> r -> r) -> r -> IntSet -> r

This could be used to hash efficiently, with serialize written so it can inline. What would be a good matching deserialize type though?

treeowl avatar Jul 19 '23 15:07 treeowl

  • serialize: could we write instance Ord IntSet with that? (#787 #470)
  • deserialize: don't know. what would be the use case?

jwaldmann avatar Jul 20 '23 12:07 jwaldmann

I doubt that's quite what we'd want for Ord, but maybe close. The use case is that I'd be more comfortable with "publicly" exporting serialize/deserialize than a function that's only good for writing the Hashable instance.

treeowl avatar Jul 20 '23 12:07 treeowl

I think we should not mix up serialization and hashing.
Serialization typically requires the size before the elements, which means we must make two traversals for IntMap/IntSet.
Hashing also wants the size for a good hash, but it is fine to add it at the end, which means we can be more efficient and calculate the hash and size in one traversal.

meooow25 avatar Mar 15 '25 05:03 meooow25