refterm Implemented mitigations for ComputeGlyphHash

In its current implementation, the hash function ComputeGlyphHash is not cryptographically secure. Multiple O(1) attacks were demonstrated (https://m1el.github.io/refterm-hash/). This change implements some mitigations for these attacks.

This includes:

Using a better padding method. Zero padding is not secure.
Using Davies-Mayer hash construction.

To my best estimate, these attacks have negligible performance impact, since the most expensive operations are preserved as is, and very few new operations are added.

Please note that these mitigations DO NOT make this hash cryptographically secure, because the underlying encryption function (apply aesdec round four times) is not a proper use of AES, and it is NOT cryptographically secure.

Oct 30 '21 18:10 m1el

I'm fine with this change, but a) this is not a cryptographic hash in the first place, and b) I'm not sure how to assess the improvement of this change over the previous version without spending a considerable amount of time actually analyzing these two functions to prove to myself that one is more collision resistant than the other in a way that anyone would care about in practice?

- Casey

Oct 30 '21 18:10 cmuratori

As I see it, there are 2 attacks possible from this:

The good ol' hash table DoS attack. It is assumed the attackers control input. They enter input that all collide to the same hash value, thereby reducing the hash table from an O(1) structure to an O(n) structure. This was a big problem back in 2011, especially when input was interpreted form the web. This is the reason siphash was invented with a dynamic seed.
One could potentially craft an input that would render differently in the terminal, thereby obscuring the real input. Let's say an attacker craft a set of bytes that hides a command like "rm / -fr" but is shown as something benign in the terminal.

Please correct me if I'm wrong - I've only spent a total of 5 minutes going through the source.

Oct 31 '21 18:10 Genbox

I believe #2 is the primary possibility we are talking about. I don't know that #1 is particularly feasible. My question was more about me not knowing if the fix was demonstrably better, not about whether it should be fixed or not.

To elaborate, the hash is only used for non-direct-mapped elements. This means that if you are using standard ASCII, for example, you will never use the hash. So to a first approximation, you would only be talking about valid Unicode strings. The attacker would have to be able to craft a valid Unicode sequence, which a Unicode parser would consider continuous for glyph generation, that has the same hash as another valid Unicode glyph sequence.

I don't actually know to what extent that would ever be possible, even with the code that is currently in there. The attacks demonstrated did not do that (they assumed you would be able to input ASCII and have it hash, but of course, most ASCII is never hashed through the hash). I assume there would be some you could do in Unicode, but, I have not looked.

Similarly, you could not replace something like "rm / -fr" because that is ASCII. So the command line will always display correctly regardless of what you do to the hash.

- Casey

Oct 31 '21 21:10 cmuratori

refterm refterm copied to clipboard

Implemented mitigations for ComputeGlyphHash

refterm
refterm copied to clipboard