linear-base Performance issues with linear hashmap

The hashmap in Data.HashMap.Mutable.Linear is orders of magnitude slower than Data.HashMap.Lazy for large inputs.

To Reproduce I implemented the same code with both Data.HashMap.Mutable.Linear and Data.HashMap.Lazy in this gist. On my computer, I get:

$ hyperfine ./lin-mut ./imm
Benchmark #1: ./lin-mut
  Time (mean ± σ):      87.6 ms ±   2.0 ms    [User: 85.6 ms, System: 2.0 ms]
  Range (min … max):    85.1 ms …  94.1 ms    33 runs
 
Benchmark #2: ./imm
  Time (mean ± σ):       9.2 ms ±   0.5 ms    [User: 6.7 ms, System: 2.6 ms]
  Range (min … max):     8.5 ms …  13.2 ms    251 runs

Here the mutable linear version is 10x slower. Changing the stopping point from 30_000 to 300_000 makes it 100x slower:

$ hyperfine ./lin-mut ./imm
Benchmark #1: ./lin-mut
  Time (mean ± σ):     10.156 s ±  0.162 s    [User: 10.118 s, System: 0.028 s]
  Range (min … max):   10.026 s … 10.509 s    10 runs
 
Benchmark #2: ./imm
  Time (mean ± σ):      90.0 ms ±   1.7 ms    [User: 83.5 ms, System: 6.3 ms]
  Range (min … max):    88.0 ms …  94.2 ms    31 runs

All programs were compiled with ghc --make -O2 -fllvm <input>.hs -o <output>.

Expected behavior I would expect the (nominally O(1)) mutable linear hashmap to have better asymptotics than the O(log n) persistent immutable hashmap.

Environment

OS name + version: Linux 64-bit. I'm not using Stack or Nix.
Version of the code: GHC HEAD (92377c27e1a48d0d3776f65c7074dfeb122b46db), linear-base master (69d2d3331673fc0b637767127ca8f5cb1bdb70f7). In case it's relevant, I had to make a few minor changes to the primitive library to get it to compile with GHC HEAD.

Additional context I was unable to figure out how to use unordered-containers with the custom-built GHC, so the faster times are built with GHC 8.10.2. Changing to Data.Map.Lazy produces times only slightly slower (20%) than Data.HashMap.Lazy, and is consistent across compiler versions, so I'm not too worried about it.

This feels like a laziness issue -- I haven't looked into the internals of Data.HashMap.Mutable.Linear, so I'm not really sure where it might be coming from.

Dec 16 '20 06:12 mb64

Some dead ends:

I expected that this was a laziness issue, but it doesn't seem to be. Here's my branch with everything strict. It only shaves off about a second: 10 seconds down to 9. (And it stays at 9 seconds even with {-# LANGUAGE Strict #-}, so laziness is not the problem here.)

Two other things I tried were disabling bounds checking on array accesses (using unsafeRead and unsafeWrite), and tuning the RTS opts for the GC (-A32M as recommended by the Haskell Wiki when using mutable arrays seemed to work best). Both of these shaved off a second each, getting it down to 7 seconds in all.

Nowhere near a 100x improvement unfortunately.

Dec 16 '20 12:12 mb64

Hi @mb64 ,

Thanks for your looking into this. It's very interesting.

To be honest, none of this has been benchmarked yet: we were focused, for the first iteration of linear base, on feasability. Though I must say that a linear hashtable is not very useful it it's not, in fact, faster than HashMap.

Now, just eyeballing your result, sensitivity to the size of the hashtable kinds of suggest issues with the resizing to me (it can be all sorts of things, like the criterion for resizing being too loose, or too strict, or the resizing process itself being slow). Another possibility, looking at the benchmark code, is that we are inserting values at position 0 a lot. And maybe there is some kind of bug when rewriting an existing value?

These are just very vague speculation at this point.

Dec 16 '20 13:12 aspiwack

Your intuition was correct -- resizing is most of the slowdown. Using empty and insertAll instead of fromList to set the initial capacity brings the time down from 7 seconds to 0.5 seconds.

I took a look at the PSL's, and there doesn't seem to be any issue with the number 0 specifically: the maximum PSL belongs to the key 7088.

I'm not sure what a healthy PSL distribution is supposed to look like, but for this map, the 5-number summary is (0, 0, 0, 149, 510). In case it helps, here's a list of every Nth one, in sorted order, for a better understanding of what the distribution looks like.

Dec 18 '20 10:12 mb64

It's bad hash values for Ints!

hash for Int is id, so the skewed input distribution resulted in a non-uniform distribution within the hashmap. Using a different hash, with

{-# LANGUAGE GeneralizedNewtypeDeriving, DeriveGeneric #-}
newtype I = I Int deriving (Linear.Eq,Eq,Ord,Show,Num,Enum,Real,Integral,Generic)

instance Hashable I where
  hash (I x) = x `hashWithSalt` (12345 :: Int)

Brought the overall time (resizing and all) to 0.25 seconds, and the maximum PSL went from 510 to 5.

I suppose Data.HashMap.Lazy, with its worst-case logarithmic time, is just more resilient to bad hash functions than a standard hash table.

The best way to resolve this for the library is probably to salt the hashes by default, as a guard against non-uniform input hash distributions. If that seems reasonable to you, I'll make a PR later.

Dec 18 '20 11:12 mb64

Ah, yes. Hash tables are susceptible to this sort of attacks in a way that trees are not.

I was aware that hash = id for ints. Even hashWithSalt is sequential for ints (you cleverly use it in reverse an get a better hash this way). I thought that your input values were random enough for it not to show. I was evidently wrong.

But even if it did, the issue remain: it's possible to get an incorrect behaviour for hashtables (even inadvertently in this case, because the default hash has poor distribution properties), it needs to be solve. I agree about your solution to salt the hashes by default. But I don't think that just replacing hash with the hashWithSalt function will do the trick, since hashWithSalt is still sequential on integers. But you can incorporate a salt in any more clever way. You will know by looking at your benchmark.

I think we also want, ideally, this salt to be random, so that we can protect against attackers trying to artificially trigger this behaviour and DOS servers which use such a Hashtable (I'm not sure linear hashtable are tremendously useful in a server-type application, but you never know). But it doesn't have to be done this time around. We can keep an issue for later.

Thanks a lot for investigating this!

Dec 18 '20 11:12 aspiwack

@mb64 just a quick note: I'm going on holiday roughly now, and all of the maintainers of linear-base will be on holiday next week. Don't be surprised if it takes a little while to merge your PR.

Thanks again for your work on this, and happy holidays.

Dec 18 '20 16:12 aspiwack

Hashable's hash is a terrible hash function for anything other than a HAMT. The HAMT in HashMap gets away with it because it effectively uses the nybbles of the hash function backwards. If we're willing to roll our own hash function, a Simple Tabulation Hashing scheme of input bytes yields hashes that are sufficiently good that you should be able to use a naive linear probing hash table directly off the answer. Basically you just need one CAF with a linked list of individual per byte hashes, xored together. Unfortunately, it is completely incompatible with the standard Hashable class.

https://arxiv.org/abs/1011.5200

Feb 17 '21 10:02 ekmett