Botan4
Botan4 Tracking Issue
It took almost 2.5 years from when development stopped on Botan2 until Botan 3.0 was released. That was suboptimal.
Goal this time around is that the entire Botan3->Botan4 development phase happens in a single release cycle. In one quarter there is a Botan 3.y.0 feature release, same as every quarter. The next quarter sees the release of Botan 4.0.0, plus a Botan3 patch release as required.
This requires effectively pre-loading almost all of the development work in advance, such that once master moves to 4.0-pre the process is mostly just hitting merge-merge-merge on a pile of already existing PRs. Thus the very long lead time (and the desire for a memorable ticket number, since this meta issue will be open for years - still sad I missed out on #4444).
Note
For issues or questions related to specific subtasks, please open up a new issue for discussion, and reference this ticket. Otherwise, this issue is likely to become clogged with many unrelated conversations.
General Outline
Botan4 is still C++20. The gains we'd get from C++23/C++26 seem minimal (not zero, but nothing amazing) and language bumps do have an effect on end users. Hopefully MISRA eventually allows C++20...
Increasing compiler versions TBD. Certainly Clang minimum version increases, so we can use std::source_location (prior work done by @KaganCanSit that could be resurrected: #5084). Increasing minimum GCC to 14 would ensure we always have __builtin_addc which would be useful for optimization purposes. At least Clang 17 for the CWG 2518 fix.
Timeline
TBD. Maybe mid 2027?
Significant Development Work
The bigger projects
- [ ] Split public key and private key types
- [ ] Remove DHE support from TLS
- [ ] Remove RSA key exchange from TLS
- [ ] Remove CBC ciphers from TLS
- [ ] Post removal of kyber_90s and dilithium_aes some cleanup are possible
- [ ] Possibly ML-KEM vs Kyber and ML-DSA vs Dilithium cleanups
- [ ] Support large element OIDs
- [ ] Internal EC data cleanups after removing BigInt EC point logic
- [ ] BigInt public API cleanups
- [ ] Resurrect changes in #5084
For some of these it may be possible to do significant amounts of work on master beforehand to get ready, which will make life simpler. In particular for DHE/RSA/CBC in TLS it may be possible to just make them optional (ie tls does not hard depend on dh module); this improves things on master immediately for those who would prefer a smaller attack surface, and makes the final incompatible change easier. Splitting the key types and the ML-KEM/ML-DSA cleanups likewise I think a lot of prep can be done without violating SemVer.
Incompatible Changes
Misc mostly easy changes, no need to preload these can probably all be done over a weekend.
- [ ] Removing ~all deprecated functions/classes
- [ ] Remove the deprecated elliptic curve groups
- [ ] Remove the deprecated DL groups
- [ ] #4684
- [ ] Remove serialization/deserialization of EC identity element
- [ ] Remove PBKDF and subclasses
- [ ] Hide PasswordHash headers
- [ ] Remove deprecated PK padding aliases
- [ ] Headers going internal: numthry.h, reducer.h, compiler.h
- [ ] Underscore prefix all internal functions (eg
create_encryption_op) [also consider using some nasty_botan_internal_prefix instead of just_]
Module Removal
If you, dear reader, are relying on any of these modules/algorithms in your code, please asap open a sub-issue for discussion, so we are aware of your usage. Do not reply to this ticket about specific modules.
- [ ] Decide if
cryptoboxshould be removed in Botan4 - [ ] Decide if
dilithium_aesshould be removed in Botan4 - [ ] Decide if
dliesshould be removed in Botan4 - [ ] Decide if
gost_28147should be removed in Botan4 - [ ] Decide if
gost_3411should be removed in Botan4 - [ ] #4721
- [ ] Decide if
kyber_90sshould be removed in Botan4 - [ ] Decide if
legacy_ec_pointshould be removed in Botan4 - [ ] Decide if
lionshould be removed in Botan4 - [ ] Decide if
mceshould be removed in Botan4 - [ ] Decide if
md4should be removed in Botan4 - [ ] Decide if
noekeonshould be removed in Botan4 - [ ] Decide if
shake_ciphershould be removed in Botan4 - [ ] Decide if
tpm(and alsouuid) should be removed in Botan4- Note
tpmis the TPMv1 only module,tpm2supporting TPMv2 is staying
- Note
Currently deprecated but not removed
These are already deprecated and certainly not desirable but kept for Botan4
crc32dsagost_3410md5siphashstreebogx919_mac
Proposed, now cancelled, incompatible changes
- #4678
Completely removing modules will make people reconsider Botan as a reliable, stable library moving forward. Myself included.
While I appreciate algorithms become less secure over time, many applications need to maintain compatibility with other systems that still use them - for whatever reason. Being able to validate a CRC32 or MD5 hash, for example, that has been received from another application/system is still a valid use case. (Or even using insecure algorithms in systems that do not require absolute security - such as file hashing for data storage in games, for example.)
May I humbly recommend that they're disabled from the builds by default, but can still be re-enabled for those who really need them?
Completely removing modules will make people reconsider Botan as a reliable, stable library moving forward.
By clearly documenting "X is deprecated and may be removed in a future major release" and then, years later, removing X in a future major release?
You make an argument for CRC32 and MD5. Which are pretty plausible I guess. Can you make a similar statement about Lion, Noekeon, or HyMES McEliece? How about other algorithms which similarly have been added and later removed like Blue Midnight Wish, DESX, Square, Skipjack, FORK-256, CS-Cipher, SHARK, ThreeWay, Rabin-Williams, EMAC, MISTY1, Kasumi, or CAST-256?
Already the cost of removing a feature is high; once something is added we have to maintain it for years and years, at least until a new major release. If the cost of removing a feature is literally infinite - once added we are stuck with it forever - then the bar to adding an algorithm becomes equally high. For example the preliminary versions of Kyber and Dilithium would not have been merged, if it was impossible to remove them later. Instead we would have waited (years!) for the final NIST competition results. And only then would we have been able to start work on post quantum secure TLS.
If there is a specific algorithm (say CRC32 or MD5) that you wish to argue against removal please do open a subticket and we can discuss. I am very open to arguments of the form "I am currently using, or might in the future want to use, (this specific thing) and removing it will be an inconvenience for me." (see for example #4678 or #4685) I am however not interested in an argument of the form "You cannot remove anything ever for any reason." If you have been operating under the impression that - in a major release - we would never remove some number of {experimental,insecure,obsolete,obscure} algorithms, after a clearly documented deprecation phase, then I regret to inform you that this may not be the library for you.
that they're disabled from the builds by default, but can still be re-enabled for those who really need them?
That could be a path to take for something like MD5, which is small and self-contained. Many of these, their existence implies significant ongoing maintenance burden because, compiled by default or not, they need to compile. For example supporting 90s mode in Kyber requires contorting the whole implementation.
Obviously there's a trade off between maintainability and backwards compatibility. And I'm not suggesting that everything should be kept unnecessarily.
But that said, if something is self contained and can be easily kept, personally speaking, I would try to keep it. I might even be tempted to break out chunks of code that isn't easily separated and keep them in a v3 compatibility module.
My original point being people who rely on third party libraries should be able to rely on them. There's nothing worse than having to waste time rewriting huge chunks of code just because someone else (on the team, not taking a swipe at you here) decide they will only ever work with the latest and greatest of every third party library in the system. I work with a few people like that already, and it's not fun constantly playing catch up.
But this is not my project, and you are free to do what you like, obviously. As someone who has recently switched to Botan from another library, I just wanted to make my thoughts known. :)
My original point being people who rely on third party libraries should be able to rely on them.
I am not trying to make peoples lives hard here. Literally a single random person showing up anytime in the next two years and saying "I am actively using X" is sufficient for something to be struck from the list of potential changes. My actual belief is that (outside of possibly DSA, MD5, X9.19 MAC and CRC32) there are literally zero users of these anywhere.
What's the reasoning for removing keccak?
It is widely used in the Ethereum crypto currency.
[Moved to #4721]
Another vote for keeping MD5 and CRC32 (specifically). As mentioned, there are still plenty of protocols (e.g. STUN) that use them and it'd be unfortunate to have to drag separate implementations in, even if it wouldn't be the end of the world. I'm assuming deprecated* features aren't going to trigger compile warnings, because that'd also be unfortunate.
*although I'd argue against deprecating them at all.
I'm c urious as to where some work on DTLS fits into this, e.g. DTLS 1.3, or at least fix some of the bugs in DTLS 1.2 (such as https://github.com/randombit/botan/issues/4782)?
My thoughts as someone who loves botan:
- I'd be hesitant to remove MD4. Maybe make it optional? I could see this being used if someone were to, e.g., have their own HIBP database and want to verify passwords against NTLM hashes or something, where MD4 is (unfortunately) used. That said, obviously it should (never!) be used in new code.
- Why would we want to remove TPM support? Removing UUIDs I guess makes sense; there are all kinds of ways of generating them, though having Botan do it is definitely a nice convenience. But pushing TPM support to be fully cross-platform would honestly be a really nice bonus feature. People should IMO use the TPM more often.
As for the other removals, I don't use much of them so don't really have a steak. But I'd love to see DTLS improved, if that's possible (I know it has some flaws which may be due to the protocol itself and not any crypto impl but I could be wrong).
Would it be worth adding the Noise framework? The only one I know of that's in modern C++ is mine, but it's far from perfect and could use a lot of improvement. But if that's too far out of scope that's fine too.
I'd be hesitant to remove MD4.
It's an unfortunate one because it is still, sadly, in use in various niches. This is a common element to the remaining we-really-should-remove-this algorithms -- the obscure and rarely implemented ones were already removed years ago.
Why would we want to remove TPM support?
The TPM deprecation is for TPMv1, not affecting TPMv2. TPMv1 is restricted to SHA-1 only, and isn't included on modern systems.
Would it be worth adding the Noise framework?
Yes Noise framework is absolutely in scope, but there is no need there to wait for Botan4, new features are added all of the time. The major version bump is only because we're removing features and/or breaking APIs.
But I'd love to see DTLS improved, if that's possible
DTLS absolutely needs some love and I have not had the time to dedicate to it recently.
It's an unfortunate one because it is still, sadly, in use in various niches. This is a common element to the remaining we-really-should-remove-this algorithms -- the obscure and rarely implemented ones were already removed years ago.
Yeah, completely understandable. I'd love to see it gone, but there are still (and probably always will be) some valid use-cases for it, even if niche.
The TPM deprecation is for TPMv1, not affecting TPMv2.
Aha. That should probably be clarified in the original comment -- it reads (at least to me) as though all TPM support is possibly going to go.
Yes Noise framework is absolutely in scope, but there is no need there to wait for Botan4, new features are added all of the time. The major version bump is only because we're removing features and/or breaking APIs.
Ah okay. I'd be happy to help with Noise, or you can just use the impl I wrote (it's public domain and not too hard to integrate/use). Shameless plug, I know.
DTLS absolutely needs some love and I have not had the time to dedicate to it recently.
It's hard for me to actually come to a conclusion on DTLS, particularly given there seems to be a lot of conflicting information about it. Some sources say it's bad and shouldn't be used, others say it's fine... Yotta yotta yotta. Anyway, would be great to see it get some love at some point.
DTLS in Botan works pretty well for me. There are a couple of documented issues where it breaks on a flakey connection due to the transmission logic which necessitates closing and re-opening the connection to try again and it would be great if it was upgraded to support DTLS 1.3 but it's perfectly usable as is.
The reconnection issues I definitely want to look into. BoGo (BoringSSL's TLS testing framework, which we also use) has tests for DTLS timeout/retransmission scenarios, some of which are disabled at the moment.