electionguard-python
electionguard-python copied to clipboard
🗑 Remove delimiters in hashes
Feature Request
Description @benaloh proposed to not use any delimiters between parameters of hashes. The idea would be simplify the hash itself. It also removes confusion on whether any delimiters are used between parameters of hashes.
Another option would be to be more clear of exactly what the delimiters are and how they are used in documentation.
Proposed Solution
This would involve removing the delimiters shown in these lines: https://github.com/microsoft/electionguard-python/blob/e35b68dc3175243ffaf872e9a0ea56c5b8a05830/src/electionguard/hash.py#L73
https://github.com/microsoft/electionguard-python/blob/e35b68dc3175243ffaf872e9a0ea56c5b8a05830/src/electionguard/hash.py#L100
This is marked as question
because we are unsure whether to implement this despite it being simple fix.
We need to be a bit careful here. We can and should remove delimiters wherever possible, but we must make certain that parameters are always of predetermined length. We have to avoid "no table" and "not able" looking the same.
For most of our hashes (especially, generation of challenges), we are hashing a fixed sequence of fixed-length arguments. Here we can just drop the delimiters and concatenate. For less constrained data (e.g., the ballot coding file), we either need delimiters or the file itself should specify the lengths of each argument.
I'm not sure I like nuking the delimiters. Consider that we're currently converting everything to base16. That means we can easily construct an ElementModP for which a corresponding number of ElementModQ values, concatenated together, have the exact same hex digits. The delimiters prevent these from hashing identically.
If anything, I'm concerned that strings are passed through to the hash function without any escaping. That's only an issue if a string can be externally specified. The solution would be use the kind of string escaper that shows up when encoding for JSON or for CSV.
The concern above is precisely why the arguments must be of known fixed-length. If the input to a hash is an ElementModP, then one can't swap it with a sequence of 16 ElementModQ values and have it be a legitimate input.
Having identical inputs of different forms is a concern even with delimiters. One could say that "A|B" with A and B as numbers could collide with a hex string when the delimiter "|" is written as its ASCII equivalent "7C". We must always check that the inputs are of proper form regardless, and once we do, we no longer need delimiters for known fixed-length inputs.