botan icon indicating copy to clipboard operation
botan copied to clipboard

Cooperative Cancellation in PasswordHash

Open huven opened this issue 2 months ago • 14 comments

Fixes #5123

Todo:

  • [X] Check std::stop_token cross-platform support
  • [x] Unittest
  • [x] Make all PasswordHash subclasses check stop_token
    • [x] Argon2
    • [x] PBKDF2
    • [x] Scrypt
    • [x] OpenPGP-S2K
    • [x] Bcrypt-PBKDF
  • [x] Drop added supports_cooperative_cancellation() when all done
  • [x] Update documentation

Note:

  • The deprecated API (non-PasswordHash based) has not been touched
  • I've added the default std::nullopt only to the base class. This is not inherited by virtual overrides, which I think is ok as the default will be taken from the static class. Similar to other methods like from_params.
  • macOS runners have been updated from 13/14 to 26
  • Added -march=armv6 flag to bare metal build (to ensure availability of atomic operations)
  • OpenPGP-S2K has been excluded from unit-testing, as it cannot be tuned above a few seconds on modern hardware, which is impossible to test reliably on GitHub runners.

huven avatar Oct 20 '25 19:10 huven

Coverage Status

coverage: 90.614% (-0.01%) from 90.628% when pulling 43ae704836ba264a2cbdf3ad14a8b162e912f723 on huven:pwhash-stoppable into afebb96a0c5e3a4be220463ece6bcf0748f75659 on randombit:master.

coveralls avatar Oct 20 '25 21:10 coveralls

Looking at the CI outputs, we seem to get in trouble with macOS (Xcode 13 and 14) and Android NDK. For both of which we claim to only support the "latest" version. Both could be updated on our CI: Xcode (to 26 that comes with Clang 17) and the NDK (from 28 to 29) [1, 2] which may or may not fix this issue.

Additionally the build configuration for arm32-baremetal (which links to -lnosys) fails because the symbol for __atomic_fetch_sub_4 is missing. I'm guessing that the stop token needs this for synchronization. Not sure whether that would become a showstopper, frankly.

reneme avatar Oct 21 '25 07:10 reneme

For macOS/iOS I know for sure it works with recent Xcode, as I already included a basic implementation in my app, works perfectly 😄.
It looks like the current images use clang 14 and 15 while stop_token was introduced at 16.

Will have some time later today to dive into this.

huven avatar Oct 21 '25 07:10 huven

Merged both PRs you linked in here, let's see what happens..

huven avatar Oct 21 '25 08:10 huven

Merged both PRs you linked in here, let's see what happens..

Thanks. I was about to propose exactly that.

The macOS 26 PR currently doesn't work -- some Python-based test fails. But the build succeeds, despite the stop token usage. Similarly, NDK r29 seems to ship the stop token as well. So both are probably a green light (if we're willing to ditch support for older NDKs and Xcodes). 🙂

That leaves the linking issue on the "arm32-baremetal" CI job which doesn't seem to provide the required atomic functionality. 🙁

reneme avatar Oct 21 '25 09:10 reneme

One python test fails indeed on macos-26. I see the test is disabled on Windows.

I have one Mac here at 26, interestingly enough all tests run ok on it (though an immediate second run fails the cli_tls_socket_tests test (Error: server bind failed), but that looks like another issue).

Will debug this a bit further, would be nice if the cli_tls_proxy_tests would fail on my device too.

huven avatar Oct 21 '25 09:10 huven

Didn't succeed in reproducing the tls_proxy test failure. Installed both boost and python with the exact same version as in GitHub runner, copied the configure.py command verbatim, all tests still pass 😢

For now, I'm tempted to add macOS to the Windows exception for that test and leave fixing the test to another issue.

[...]
   INFO: Ran cli_tls_proxy_tests in 0.41 sec
[...]
Ran 238 tests with 0 failures in 7.82 seconds

huven avatar Oct 21 '25 12:10 huven

The arm32 cross-compile might need a -latomic added to the linker phase, will test that quick&dirty first. If that works it needs to be added somewhere in the configure system.

huven avatar Oct 21 '25 12:10 huven

After some more experimenting (the -latomic didn't help, will revert that), it seems that the atomic primitives are only available if the cpu supports them, so e.g.

arm-none-eabi-c++ -std=c++20 -march=armv6 -O2 -specs=nosys.specs  stop_token_test.cpp

works like a charm, as ARMv6 introduced LDREX/STREX. Lowering that

arm-none-eabi-c++ -std=c++20 -march=armv5t -O2 -specs=nosys.specs  stop_token_test.cpp
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: /tmp/cc1A7gy2.o: in function `std::stop_token::stop_requested() const':
stop_token_test.cpp:(.text._ZNKSt10stop_token14stop_requestedEv[_ZNKSt10stop_token14stop_requestedEv]+0x14): undefined reference to `__sync_synchronize'
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: /tmp/cc1A7gy2.o: in function `main':
stop_token_test.cpp:(.text.startup+0x30): undefined reference to `__atomic_fetch_add_4'
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: stop_token_test.cpp:(.text.startup+0x70): undefined reference to `__atomic_fetch_sub_4'
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: stop_token_test.cpp:(.text.startup+0x88): undefined reference to `__atomic_fetch_sub_4'
[...]

Assuming Botan is used on pre-ARMv6 architectures, that will require some additional work, like providing a custom implementation of some atomic operations, using some kind of locking.
Or throwing in a lot of #ifdef's in the code, which I prefer to avoid..

huven avatar Oct 21 '25 14:10 huven

After giving this some thought: I think it's a good idea to decide first if it is safe to assume that ARMv6 is available, or if prior architectures need to be supported too.

  • If assuming ARMv6 is ok, the -march=armv6 can be added to CI fixing the build error.
  • If ARMv5 and older need to be supported, this feature will need a flag and use #ifdef's in the code.

huven avatar Oct 21 '25 17:10 huven

To followup my own question: I will assume ARMv6 or higher, so added that to the CI build. Should succeed now (let's see).

And I've added a unit test to test the stop_token with Argon2 (can be extended to other pwdhashes once they support the stop_token). The test only succeeds if an Invalid_State exception is thrown (i.e. if the derivation was aborted).

huven avatar Oct 27 '25 22:10 huven

To followup my own question: I will assume ARMv6 or higher, so added that to the CI build.

For the sake of this PR, assuming that is certainly fine. I have no way of knowing whether that holds in general though.

Given that we already wrap std::mutex and std::lock_guard, we could consider similarly wrapping the stop token. If a platform doesn't support it (e.g. because atomics are disabled for bare metal), it could simply fall back to a dummy implementation and ignore the cancellation request.

Wrapping crossed my mind too. I tried to avoid that as derive_key is public API, having the std:: type there makes the API easier to use, as the caller will know how the argument behaves.

I also considered #ifdef's inside the prototype, wrapping the prototype with ifdef/else, and overloading with ifdef. All of them have some arguments in favor and against (readability, duplication of code/comments).

I ended up assuming ARMv6 is ok🤞

huven avatar Oct 28 '25 12:10 huven

I ended up assuming ARMv6 is ok🤞

I'm guessing it is. Existing users still have the escape hatch of disabling the modules that use password-based key derivation if they don't need that for their use case. @randombit gets the final say on that, though. :)

reneme avatar Oct 29 '25 06:10 reneme

@randombit @reneme I think this PR is ready for review, looking forward to your comments if any further improvements are necessary. The opening comment contains a Note section with some thoughts and important changes to review.

huven avatar Nov 01 '25 18:11 huven