botan Add a compile-time `hex_decode

This adds a constexpr hex_decode_array() that takes a string literal and decodes it into a statically sized array at compile time. Also, it adds a user-defined literal ""_hex wrapping this functionality. Admittedly, this is somewhat a kink of mine. Nevertheless, I think it can be useful as a utility for us internally and also for applications.

Essentially, this allows for easier-to-read binary constants throughout the code, as one doesn't have to break them into uint8 literals, like so: std::array{0xba, 0xad, 0xca, 0xfe}. Instead, one would write: "baadcafe"_hex.

What do you think?

As example usages I translated the data values in twofish_tab.cpp and refactored the pcurves EC parameter loading. The latter already used a similar constexpr mechanism. For the former, nm twofish_tab.o confirms that all decoding was done at compile time. The output is equivalent to the code on latest master:

0000000000000000 R _ZN5Botan7Twofish2Q0E
0000000000000100 R _ZN5Botan7Twofish2Q1E
0000000000000200 R _ZN5Botan7Twofish2RSE
0000000000000300 R _ZN5Botan7Twofish11EXP_TO_POLYE
0000000000000400 R _ZN5Botan7Twofish11POLY_TO_EXPE
0000000000000500 R _ZN5Botan7Twofish4MDS0E
0000000000000900 R _ZN5Botan7Twofish4MDS1E
0000000000000d00 R _ZN5Botan7Twofish4MDS2E
0000000000001100 R _ZN5Botan7Twofish4MDS3E

Dec 12 '24 16:12 reneme

coverage: 91.249% (-1.2%) from 92.433% when pulling 7f95700022c221045e88d16317ec7e25b4ca4d34 on Rohde-Schwarz:feature/constexpr_hex_decode into 6471abdd1960ee55c1c99c9b86544bfa28dc060d on randombit:master.

Dec 12 '24 17:12 coveralls

I like it. I use the very similar hex! macro in Rust all the time. (Haven't reviewed at all but conceptually :+1:)

Dec 12 '24 17:12 randombit

Not necessarily a big fan of the hex suffix here. For short strings

decode_hex(“FOOF”)

reads fine

and for long strings

“ABCDE... 1000s of chars ...”_hex

it’s not clear what’s happening until you get all the way to the end.

That whitespace has to be omitted also makes this a lot less applicable. I guess it’s hard to know the output length at compile time without that, at least without a hint. But for example in the Twofish tables, there is structure there and it’s incredibly hard to match up the table elements with reference values when it’s all one gigantic contigous string. There are other cases where this is not so (the bcrypt constant input block comes to mind, there are others I’m sure)

Also definite no on making this part of the public API, at least at this time. Internally we have a lot of hex constants and this could be nice to have in some situations, not so clear this applies to users, at least not commonly. (Outside of footgunning themselves with compile time constant keys!) And as ever, if it’s in the public API we’re stuck with it for a long time …

Lastly, the mp decoder accepting single digit or odd-length inputs was a completely intentional design decision and wanting to support that is the reason I did not even attempt to promote it to a general (not consteval, not outputing anything but word-ish types) compile time hex decoder. It’s ok to duplicate a few lines of code sometimes ;)

Dec 14 '24 13:12 randombit

Not necessarily a big fan of the hex suffix here.

I feel the same actually. As you say, the intent isn't clear because the suffix isn't necessarily obvious. In some contexts this can be mitigated by deliberately stating the variable type as std::array<uint8_t, ...>, but that's nothing that can be enforced. Frankly, I never used those user-defined literals before and wanted to try them. Totally fine for me to remove it again.

That said, I find that the literals communicate the compile-time nature of the transformation fairly clearly. And they do add less visual clutter for hard-coded (short) strings than the function-style variant. E.g for things like domain separation strings.

That whitespace has to be omitted also makes this a lot less applicable. I guess it’s hard to know the output length at compile time without that, at least without a hint.

Yes, that's the reason I introduced this limitation on whitespace omission. I'm fairly certain, one could add more template trickery to actually figure out the output length regardless. Perhaps another day. 😏 While I agree that this makes for a trade-off when it comes to the 32bit values in twofish, I would argue that those are mostly write-once-read-never values except for things like external audits. And for that, an explanatory comment would probably mitigate it enough. Or am I missing something?

Also definite no on making this part of the public API, at least at this time. Internally we have a lot of hex constants and this could be nice to have in some situations, not so clear this applies to users, at least not commonly.

In fact, I jumped into this rabbit hole because I'd love to have it for one of our applications. Namely, for hard-coded short strings (salts, context strings, ...) as mentioned above. It's certainly reasonable to keep it out of the public API and gain some experience with it first, though. I'm assuming you mean the user-defined literal and are fine with the constexpr hex_decode in the public API, no?

It’s ok to duplicate a few lines of code sometimes

I somehow foresaw you to say that, tbh. 😅 While tinkering, I realized that this uses essentially the same construction, and used it as a testbed. I'll remove it.

Summing up what I understood:

[x] Remove the user-defined literal from the public API
[x] Don't use the literal for the twofish constants (in favor of hex_decode_array())
[x] Leave the 32-bit literals in twofish as-is (at least until whitespace may be omitted at compile-time)
[x] Remove the refactoring in pcurves (probably except one fix in secp256r1 where a constant was prefixed with 0x, which seemed unintentional to me)

(it might be one or two days until I can get around to making those changes)

Dec 15 '24 19:12 reneme

I moved the _hex literal into a new (internal) module in utils/literals. This can be a place to add further user-defined literals if this proves useful. E.g. _b64 for compile-time base64 decoding, _bigint to create a BigInt straight from a hex-string, or things like _hexv to produce a std::vector<> filled with the decoding result of some hex-string.

Also, I removed the overly eager usages in Twofish and pcurves. Instead, I drive-by-modernized some TLS code that contained hard-coded binary constants, to showcase the usage of this new literal.

Dec 19 '24 13:12 reneme

That whitespace has to be omitted also makes this a lot less applicable.

Turns out that ignoring whitespace is fairly trivial when using the user-defined literal, but not for the constexpr function taking an ordinary parameter. The former is implemented using the "string literal operator template" (aka. our StringLiteral class) (which can be used for the consteval implementation of user-defined string literals in C++20). As such the content of the string literal is available at compile time and can be used to determine the size of the output array. I.e, the following compiles just fine:

constexpr std::array<uint8_t, 4> baadbeef = "ba ad be ef"_hex;

For the ordinary function, that's not the case unfortunately. Even when declared consteval, the following doesn't compile, because the input array str is not considered "known at compile time":

template <std::size_t N>
consteval std::size_t count_char_in_literal(const char (&str)[N], char ch) {
    constexpr auto ws = std::count_if(str, str + N, detail::is_hex_whitespace);
    return ws;
}

Taking the string literal as a template parameter would probably be a suitable workaround, but the resulting syntax is too strange for my tastes, such as: hex_decode_array<"baadcafe 1337beef">(). One could probably make it work using a macro, but let's not go there.

Dec 23 '24 12:12 reneme

Add a compile-time `hex_decode_array()`