core icon indicating copy to clipboard operation
core copied to clipboard

HashMap / HashSet use fixed seed (0) in xxHash32 — vulnerable to HashDoS attacks

Open quake opened this issue 6 months ago • 2 comments

Hello Moonbit team,

I noticed that the current implementation of HashMap / HashSet in the core library uses xxHash32 with a fixed seed = 0 as the default hashing strategy.

This design choice has some serious security implications:

Problem

  1. xxHash32 is not collision-resistant. It was designed for speed, not for adversarial scenarios.
  2. With a fixed seed 0, the hash function becomes deterministic and predictable.
  3. An attacker can precompute a large set of colliding keys that all map to the same hash value.
  4. By sending these crafted keys to any Moonbit program that stores untrusted input in a HashMap/HashSet, the hash table operations degrade from O(1) average time to O(n) or worse, leading to CPU exhaustion DoS attack.

This is exactly the kind of vulnerability that affected PHP, Python, Java, Ruby, and many other languages in the past (see HashDoS 2011 advisory).

Benchmark Evidence

I wrote a small benchmark to demonstrate:

  1. SequentialKey1 – a set of sequential keys, no collisions.
  2. SequentialKey2 – another set of sequential keys, aslo no collisions.
  3. CollidingKey – 64 keys that all collide under xxHash32.
test (b : @bench.T) {
  let key1 = [
    "key0", "key01", "key02", "key03", "key004", "key005", "key006", "key007", "key008",
    "key009", "key010", "key011", "key012", "key013", "key014", "key015", "key016",
    "key017", "key018", "key0019", "key0020", "key0021", "key0022", "key0023", "key0024",
    "key0025", "key0026", "key0027", "key0028", "key0029", "key0030", "key0031",
    "key0032", "key0033", "key0034", "key0035", "key0036", "key0037", "key0038",
    "key0039", "key0040", "key0041", "key0042", "key0043", "key0044", "key0045",
    "key0046", "key0047", "key0048", "key0049", "key0050", "key0051", "key0052",
    "key0053", "key0054", "key0055", "key0056", "key0057", "key0058", "key0059",
    "key0060", "key0061", "key0062", "key0063",
  ]
  b.bench(name="SequentialKey1", fn() {
    let map = @hashmap.new()
    for i in 0..<key1.length() {
      map.set(key1[i], i)
    }
  })
  let key2 = [
    "cat0", "cat01", "cat02", "cat03", "cat004", "cat005", "cat006", "cat007", "cat008",
    "cat009", "cat010", "cat011", "cat012", "cat013", "cat014", "cat015", "cat016",
    "cat017", "cat018", "cat0019", "cat0020", "cat0021", "cat0022", "cat0023", "cat0024",
    "cat0025", "cat0026", "cat0027", "cat0028", "cat0029", "cat0030", "cat0031",
    "cat0032", "cat0033", "cat0034", "cat0035", "cat0036", "cat0037", "cat0038",
    "cat0039", "cat0040", "cat0041", "cat0042", "cat0043", "cat0044", "cat0045",
    "cat0046", "cat0047", "cat0048", "cat0049", "cat0050", "cat0051", "cat0052",
    "cat0053", "cat0054", "cat0055", "cat0056", "cat0057", "cat0058", "cat0059",
    "cat0060", "cat0061", "cat0062", "cat0063",
  ]
  b.bench(name="SequentialKey2", fn() {
    let map = @hashmap.new()
    for i in 0..<key2.length() {
      map.set(key2[i], i)
    }
  })
  let key3 = [
    "key0", "key15", "key72", "key94", "key337", "key349", "key387", "key407", "key451",
    "key456", "key474", "key572", "key706", "key720", "key724", "key828", "key929",
    "key932", "key999", "key1097", "key1115", "key1186", "key1246", "key1271", "key1304",
    "key1350", "key1353", "key1385", "key1409", "key1445", "key1456", "key1568",
    "key1703", "key1746", "key1768", "key1805", "key1903", "key1916", "key2073",
    "key2088", "key2149", "key2233", "key2277", "key2358", "key2370", "key2395",
    "key2429", "key2457", "key2529", "key2555", "key2558", "key2663", "key2665",
    "key2725", "key2738", "key2755", "key2831", "key2846", "key2889", "key2892",
    "key3109", "key3152", "key3181", "key3183",
  ]
  b.bench(name="CollidingKey", fn() {
    let map = @hashmap.new()
    for i in 0..<key3.length() {
      map.set(key3[i], i)
    }
  })
}

Benchmark results on my machine:

name           time (mean ± σ)         range (min … max) 
SequentialKey1    4.64 µs ±   0.06 µs     4.57 µs …   4.75 µs  in 10 ×  20681 runs
SequentialKey2    4.36 µs ±   0.02 µs     4.33 µs …   4.39 µs  in 10 ×  22589 runs
CollidingKey     10.54 µs ±   0.03 µs    10.49 µs …  10.58 µs  in 10 ×   9458 runs

Even with only 64 colliding keys, the runtime already doubles compared to non-colliding cases. With larger numbers of colliding keys, the cost grows roughly O(n²), which makes DoS attacks very feasible.

Benchmark results with 256 keys on my machine:

name           time (mean ± σ)         range (min … max) 
SequentialKey1   20.56 µs ±   0.20 µs    20.39 µs …  21.03 µs  in 10 ×   4697 runs
SequentialKey2   21.34 µs ±   0.11 µs    21.13 µs …  21.45 µs  in 10 ×   4668 runs
CollidingKey    116.14 µs ±   0.22 µs   115.80 µs … 116.47 µs  in 10 ×    857 runs

If you want to reproduce this issue yourself, I have created a small script that generates colliding keys for xxHash32(seed=0). You can use these generated keys with the current HashMap / HashSet implementation to easily observe the performance degradation caused by collisions.

Suggested Fix

  1. Do not use a fixed seed (0) by default.
  2. Instead, initialize the seed with a random value at program startup (similar to Rust’s RandomState or Go’s runtime hash seeding).
  3. This makes it impractical for an attacker to precompute collisions, since the seed changes on every run.
  4. Consider using a stronger keyed hash function (e.g., SipHash or AHash) for default hashing in security-sensitive contexts. (If performance is the main concern, xxHash32 is fine with random seeding)

Why this matters

Most developers will rely on HashMap/HashSet without thinking about hash function details. The current implementation exposes every Moonbit application that accepts untrusted keys (e.g. from HTTP requests, JSON parsing, etc.) to potential DoS attacks.

Making random seeding the default behavior provides a safe baseline without affecting existing APIs.

Importantly, in the recent years, all mainstream production-ready languages and runtimes (including Rust, Go, Java, Python, JavaScript/Node.js, Ruby, PHP, etc.) have already adopted randomized hash seeding to mitigate HashDoS attacks. Using a fixed seed is no longer considered a safe or modern default.

quake avatar Aug 27 '25 07:08 quake

We'll try to address this ASAP.

peter-jerry-ye avatar Aug 27 '25 10:08 peter-jerry-ye

One possible solution is to store colliding entries as a self-balanced binary tree, so worst time complex is O(log(N)). But this makes hashmap require Compare trait.

hackwaly avatar Aug 28 '25 17:08 hackwaly