wycheproof icon indicating copy to clipboard operation
wycheproof copied to clipboard

Remove notes bugType?

Open FiloSottile opened this issue 3 months ago • 2 comments

Categorizing notes is sometimes very tricky, as some issues straddle the line. That takes time, and in my experience it was the slowest part of adding my ML-KEM vectors. Is that effort worth it? Does anyone use those categorizations? If not, we should remove them.

/cc @cpu

FiloSottile avatar Sep 01 '25 12:09 FiloSottile

in my experience it was the slowest part of adding my ML-KEM vectors.

Digging deeper at this, was it because you felt like multiple bugTypes were appropriate and it was hard to pick one, because there wasn't a bugType that matched what you were looking at, or something else entirely?

Is that effort worth it? Does anyone use those categorizations?

I think it would help to answer the 2nd before deciding on the 1st. I'd be more willing to put effort into keeping or improving this aspect of the vectors if it was important to downstream users.

I spent some time with GitHub code search as a first crack at finding anyone using these. Outside of the vectors themselves I only saw bugType ref'd in some language-specific structured representations of the JSON data (e.g. like this). I didn't spot any users that were interrogating the value.

cpu avatar Sep 01 '25 14:09 cpu

We're not using them.

Some minor classification might be useful to us, since we do process flags programmatically, but nowhere near as fine-grained as the bugTypes are now, and it honestly doesn't matter much. (We're getting by without it now.)

We use the flags to resolve whether an acceptable result should have been valid or invalid. Because while Wycheproof doesn't know if we, say, support EC compressed points, we know if we do and can use the notes to pick a side or another. The model we settled on is passing in a set of "allowed flags". If every flag is allowed, the acceptable result is valid. Otherwise, it is invalid. This... works okay.

Working through an update now, this does mean that we have to go in and fix the set of allowed flags on update, since a lot of flags are just descriptive. But I think this isn't a huge deal. Having to tweak our test driver code as the tests evolve is fine, and unfortunately a lot of legacy primitives have so much variability that you need a fairly rich "public API" for the implementation to configure the acceptable tests. With more effort, I'm sure we could come up with a better public API, but meh.

davidben avatar Sep 01 '25 14:09 davidben