electionguard-python icon indicating copy to clipboard operation
electionguard-python copied to clipboard

🐞 HashedElGamal is not compliant with the spec

Open danwallach opened this issue 2 years ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

The code in hmac.py (in particular, the get_hmac() function) is being used as part of HashedElGamalCiphertext for two very different purposes.

For the first purpose, computing an HMAC, it seems to be correct. Example, where the spec says to use HMAC (e.g., $c_2 = HMAC(k_0, c_0 | c_1)$), the code says:

mac = get_hmac(mac_key, to_mac)

This is fine.

The ElectionGuard spec notes that the actual encryption of the message ($c_1 = m_1 \oplus k_1 | m_2 \oplus k_2 | \cdots | m_{b_m} \oplus k_{b_m}$) is supposed to use a NIST 800-108-compliant key derivation function (KDF). The Python code is not compliant with the NIST spec. In particular, the code that creates those $k_i$ values does this:

        data_key = get_hmac(
            session_key.to_hex_bytes(),
            encryption_seed.to_hex_bytes(),
            bit_length,
            (i + 1),
        )

So, what's going on inside get_hmac? Its input to the HMAC function is ultimately the byte concatenation of three values: start_byte + msg + end_byte where start_byte is the index $i$, and end_byte is the message length. The NIST spec says:

K(i) := PRF (KI, [i] || Label || 0x00 || Context || [L])

Suggested fixes below.

Expected Behavior

It's confusing to use get_hmac to serve two radically different purposes. First and foremost, it would be good to have a KDF class that implements the NIST spec correctly and then use it to compute the $k_i$'s. Anyway, even if you don't do that, there are a few things that you do need to do:

  • You're not doing anything for the NIST-defined Label or Context strings (okay for a start, but these strings serve a purpose and should be defined and used)
  • You're not using the zero byte after the Label (Suggestion: add the byte.)
  • You're using precisely 32 bits to encode the two numbers ($i$ and $L$), when it's possible (if unlikely) that this code might be used to encrypt more than 4 billion blocks. (Suggestion: use a variable-length encoding for lengths.)
  • You should not be converting keys to hexadecimal bytes. Those are originally ElementModQ values, which already fit into 32 bytes. This encoding process yields more bits than HMAC-SHA256 normally supports for its key values. (Suggestion: extract the internal BigInteger as exactly 32 big-endian bytes.)

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

danwallach avatar May 27 '22 21:05 danwallach

FWIW, in both our Kotlin and TypeScript code, we've implemented a UInt256 class, as distinct from arbitrary bigints or ElementModQ. This turns out to make everything involving hash and MAC computation much more straightforward. We then have helper methods to convert from UInt256 to ElementModQ and back again when necessary.

danwallach avatar May 27 '22 22:05 danwallach

Thanks @danwallach this is helpful. We've discussed and will certainly adopt these or variations for 2.0. We will unlikely be able to accommodate them for 1.0.

rc-ms avatar Jun 03 '22 21:06 rc-ms