Use a more common JSON formatting
When writing some test vectors, I noticed that the JSON formatting is a bit uneven. It looks like the original Wycheproof test put a space before the : in JSON dictionaries, which is a bit uncommon. I see we've already had other test vectors, such as ML-DSA, which use the more conventional pattern: only a space after the :.
I propose we use a uniform encoding, and use one without the extra space. (I.e. we run a tool to rewrite all the old vectors.) The value of a more common encoding is that contributors can more easily add test vectors without a ton of diff noise. This came up when I was putting together https://github.com/C2SP/wycheproof/pull/161 and had to figure out how to reproduce the original formatting. It happens that Python has a separators parameter to reproduce it but other APIs (JS, Go) only control the indent . We should pick a convention that the most tools can reproduce easily.
(Even better would be if there were some tool for folding a contributor's test vector set into the combined file, and then the formatting would be whatever that tool thinks it is, but either way I think we would be better served by a more conventional formatting.)
Thanks for opening this issue.
I propose we use a uniform encoding, and use one without the extra space. (I.e. we run a tool to rewrite all the old vectors.)
+1 - this has been on my mind as well. I think it makes sense to do a pass over the existing data and then enforce consistency going forward in CI with the same tool.
Even better would be if there were some tool for folding a contributor's test vector set into the combined file
Agreed, @FiloSottile and I discussed a similar idea in an earlier conversation. I think that's a place where I'd like to invest some time/energy after #151 (though if someone else is keen to get started before I can get to it I'm happy to support their effort!)
Happy to put up a PR if it would help. Mostly I didn't want to collide with the work you're doing in #151.
Thanks for the offer :-) I think we can get the in-flight PRs merged in the next day or so. If you wanted to open a PR for the JSON formatting after that I would be grateful.
@davidben things took a little longer than my initial projection (surely an unprecedented phenomenon for software engineering estimates 😅) but I think things are relatively stable now if you're still up for contributing a PR for this.
Sure! Although in a similar unprecedented software engineer phenomenon, I realized that canonicalizing the indent with standard JSON implementations very quickly bumps into how to canonicalize the field order. On the one hand, sorting keys is very easy and something every tool can do. On the other hand, it is a bit sad to not put tcId first, etc.
I think I can get the Python JSON implementation to preserve the order, so that's probably the place to start with a quick rewrite pass, but what to do after that is less clear. If we want maximum effort, an order derived from the order in the schema would be very tidy... and also definitely require a lot of tooling work.
Hmm, okay, some other tricky things if I compare what Python did and look at the diff with all spaces removed:
- AES-FF1 tests use lists of integers for their ciphertexts. Python will put one integer per line, which is annoying for the really big ones.
- A few other short arrays like
"key_ops": ["encrypt", "decrypt"]get more newlines. That's probably not the end of the world. - If I leave
ensure_ascii=True, we lose the Unicode comments in ML-KEM tests. If I setensure_ascii=False, AES-FF1 tests, which include things like\u007in the message, get non-printable stuff in there.
I don't know much about AES-FF1 or FPE, so I don't yet have any intelligible opinions here. I need to poke around that one. (We could start by just reindenting everything but AES-FF1 I suppose.)
On the one hand, sorting keys is very easy and something every tool can do. On the other hand, it is a bit sad to not put tcId first, etc.
🤔 I don't feel super strongly but I view the JSON as intended for machine consumption vs human consumption. With that framing I wonder if just alpha-sorting the keys isn't the end of the world. I agree that conceptually having the tcId first (or ordering by schema fields) is nicer, but I also don't know that it's an important enough property to be worth the extra work.
I think I can get the Python JSON implementation to preserve the order, so that's probably the place to start with a quick rewrite pass
That works for me. If nothing else it gives other interested parties a chance to weigh in.
AES-FF1 tests use lists of integers for their ciphertexts.
Hmmm I see that NIST is using hex encoding for the input plaintexts and expected ciphertexts (draft-celi-acvp-symmetric 8.2, example JSON) alongside an alphabet (draft-celi-acvp-symmetric 7.3-25) for their AES FF1 testcases which is at least some small evidence that might be a viable route.
I don't know much about AES-FF1 or FPE, so I don't yet have any intelligible opinions here. I need to poke around that one. (We could start by just reindenting everything but AES-FF1 I suppose.)
I'm in the same boat w.r.t FPE so leaving it as a special case for now sounds good to me. If we can get the JSON sorted for other vectors I think it's still a win.
@str4d I saw you have a crate for FPE w/ AES-FF1, but it isn't using the Wycheproof vectors AFAICT. Do you have any thoughts on the encoding discussion with your implementer hat on?
The easiest way to change the formats would likely be to use the formatters in the test vector generation code. I had to write my own formatter because the linters at Google gave me thousands of warnings with any available JSON formatter.
I added the text format for the FPE tests because it is often easier to use. I also added the format using list of integers so that it is possible to test cases with radix >= 256.
I also have written tests for the fpe crate as well as other implementations of FF1 (btw fpe passes everything). So far I have not encountered any problems with the formats.
(draft-celi-acvp-symmetric 7.3-25) for their AES FF1 testcases which is at least some small evidence that might be a viable route.
Hmm. That seems unable to represent anything above radix 64.
The easiest way to change the formats would likely be to use the formatters in the test vector generation code.
It's great to have the old test vector generation code available, but to get to a set of test vectors that more than one person can add to, I think we need something that can ingest test vectors as they are, rather than reconstruct them.
I added the text format for the FPE tests because it is often easier to use. I also added the format using list of integers so that it is possible to test cases with radix >= 256.
The alphabet question is largely what value the string-based InvalidPlaintext tests provide. Specifically if the \u0007 is testing something in particular or does, say, space, work just as well? Talking to some folks, it sounds like the answer is we can just use a space, and then that resolves the ensure_ascii question?
More generally, I suppose the question is what's the expected interface for an AES-FF1 library? If it takes a list of integers, then string-based InvalidPlaintext test doesn't do anything because the error will be caught at the driver, before it even makes it to the implementation under test. If it's list-based, how does that interact with an alphabet-based implementation? I guess it would need to pick a mapping for characters out of range?
For the sake of using standard JSON formatting, perhaps the lists should be replaced with something dumb like "take all the integers and encode as big-endian u16s, concatenate, then hex-encode the lot of it". That puts a bit more processing into the test driver code, but then we don't get a line per integer and keep to printable characters, so we don't need to wrestle with escapes, or whether the strings should be interpreted as UTF-8 or UTF-16.
I also have written tests for the fpe crate as well as other implementations of FF1 (btw fpe passes everything). So far I have not encountered any problems with the formats.
For something sustainable, I don't think it's enough for one person to have tested a library once. Rather, the test driver should live in the library. Then the tests can run continuously as we improve the tests and as the library changes. Historically I think this area has not been so great. It's often unclear how a library developer is intended to use the test vectors... do I need to run both the string- or list-based FF1 tests? Are they two views of the same tests, or will I be missing cases if I run one and not the other? The testvectors vs testvectors_v1 mess being the biggest example of this.
I would thus err more towards having single formats, provided they have enough fidelity to test everything. If it turns out we need both to test some of everything, so it goes. (Makes sense to have both P1363 and DER ECDSA tests, for example.)
@davidben I took a crack at moving this forward with the simplest approach I could manage. WDYT? https://github.com/C2SP/wycheproof/pull/183