rustls-ffi
rustls-ffi copied to clipboard
0.14.0: Adopt crypto provider API, use aws-lc-rs as default provider
This branch updates rustls-ffi to support the cryptography provider model established upsteam in rustls since version 0.22, and later refined in 0.23. By the end of the changeset rustls-ffi, the unit tests, the integration tests, and CI will support aws-lc-rs or *ring* as cryptography providers. We follow the precedent set upstream in Rustls 0.23 and make aws-lc-rs the default choice. We do not support a model where rustls-ffi is built with both crypto providers enabled. The Makefile/CMake build tooling only allows selecting one or the other.
Notably as this branch introduces a number of breaking API changes the version number is updated from 0.13.0 to 0.14.0.
Resolves https://github.com/rustls/rustls-ffi/issues/366
TODO
- [x] update Rustls 0.23.4 -> 0.23.11
- [ ] update
Changelog.mdto describe breaking changes (note: waiting on review feedback in case there are changes). - [x] consider trimming CI matrix down?
- [x] attempt a demo using a 3rd party crypto provider
- [x] stage a downstream update in
curl - [x] stage a downstream update in
mod-tls
I recommend reviewing this commit-by-commit. I've taken extra care to make sure each commit builds/tests cleanly along the way.
attempt a demo using a 3rd party crypto provider
I put together a rough proof of concept for this. I don't plan to "productionize" it at this time, but it was sufficient for proving the concept.
rustls-post-quantum-ffi is a new crate that I made that wraps up the Rust rustls-post-quantum provider. It exposes one fn, const void *rustls_post_quantum_ffi_crypto_provider(void), returning a void* to what is an Arc<CryptoProvider> under the hood.
I used that to quickly extend the rustls-ffi client.c demo so that it would use rustls_post_quantum_ffi_crypto_provider() in place of the AWS LC RS backend.
With that in place we can build client and observe it doing post-quantum KEX with Cloudflare's test server:
[daniel@blanc:~/Code/Rust/rustls-post-quantum-ffi]$ cargo capi install --prefix=/tmp/rustls-ffi
<snipped>
Finished `release` profile [optimized] target(s) in 0.08s
Installing pkg-config file
Installing header file
Installing static library
Installing shared library
[daniel@blanc:~/Code/Rust/rustls-ffi]$ cargo capi install --prefix=/tmp/rustls-ffi
<snipped>
Finished `release` profile [optimized] target(s) in 18.19s
Building pkg-config files
Populating uninstalled header directory
Installing pkg-config file
Installing header file
Installing static library
Installing shared library
[daniel@blanc:~/Code/Rust/rustls-ffi]$ tree /tmp/rustls-ffi/
/tmp/rustls-ffi/
├── include
│ ├── rustls.h
│ └── rustls-post-quantum-ffi.h
└── lib
├── librustls.a
├── librustls-post-quantum-ffi.a
├── librustls-post-quantum-ffi.so -> librustls-post-quantum-ffi.so.0.1.0
├── librustls-post-quantum-ffi.so.0.1.0
├── librustls.so -> librustls.so.0.14.0
├── librustls.so.0.14.0
└── pkgconfig
├── rustls.pc
└── rustls-post-quantum-ffi.pc
[daniel@blanc:~/Code/Rust/rustls-ffi]$ PKG_CONFIG_PATH=/tmp/rustls-ffi/lib/pkgconfig make --file=Makefile.pkg-config PROFILE=debug CRYPTO_PROVIDER=aws_lc_rs
<snipped>
[daniel@blanc:~/Code/Rust/rustls-ffi]$ LD_LIBRARY_PATH=/tmp/rustls-ffi/lib ldd target/client
<snipped>
librustls.so.0.14.0 => /tmp/rustls-ffi/lib/librustls.so.0.14.0 (0x00007ffff7085000)
librustls-post-quantum-ffi.so.0.1.0 => /tmp/rustls-ffi/lib/librustls-post-quantum-ffi.so.0.1.0 (0x00007ffff6dee000)
<snipped>
[daniel@blanc:~/Code/Rust/rustls-ffi]$ RUSTLS_PLATFORM_VERIFIER=1 LD_LIBRARY_PATH=/tmp/rustls-ffi/lib ./target/client pq.cloudflareresearch.com 443 /cdn-cgi/trace 2>/dev/null | grep "kex"
kex=X25519Kyber768Draft00
kex=X25519Kyber768Draft00
kex=X25519Kyber768Draft00
@ctz @jsha I'll be out of office next week. Is it possible you folks could take a look at this branch during that time so I can address feedback when I'm back? I think this is one where we want both reviews before merging. Thanks!
stage a downstream update in curl
Here's a WIP branch. For the time being I've left all of the vtls/rustls.c code using the interfaces that assume a clear default provider is available. That means whoever builds/installs the librustls.so/librustls.a that curl links choses the crypto provider. There are potentially more complicated options we could pursue where curl sets the default provider based on its own configuration, but its not clear if that's the direction the downstream project will want to go or not. Probably merits discussion.
stage a downstream update in
mod-tls
This one needs a bit more work, but here's the start of a HTTPD branch that builds mod-tls w/ 0.14.0. Like the curl branch for now I've left everything using the default crypto provider.
The experience made me think that it would be nice to offer a simplified certified key construction and supported ciphersuite list construction that assumes the default provider. These were two places where the mod-tls diff required getting a handle on the default explicitly and passing it in. It'd be more convenient for rustls-ffi to do that work for you. I'll push a revision for this when I'm back in office.
cpu force-pushed the cpu-366-choose-your-own-crypto-adventure branch from d29400b to b07551e
I noticed a Makefile omission where building with CRYPTO_PROVIDER=ring was causing aws-lc-rs to be built as well. The force push was for that fix.
consider trimming CI matrix down?
A full run is ~9m. That feels reasonable-ish to me, but it's also probably not necessary to build/test with both clang/gcc for each provider/platform choice. I'm going to leave this as-is until there's additional feedback to consider.
Finished reviewing. Two last thoughts:
- I think the test matrix should include a case that builds both ring and aws-lc-rs into the same binary, since that's supported.
- Ideally the version string should say which crypto providers the package was built against, though that can be a followup.
I think the test matrix should include a case that builds both ring and aws-lc-rs into the same binary, since that's supported.
I should make this more clear in the PR description, but I've tried to not support this use-case. It's technically true that you can build the rustls-ffi Rust components manually activating both --features=ring and --features=aws_lc_rs, but the Makefile, Makefile.pkg-config, and CMake build configurations only support choosing one or the other (defaulting to aws_lc_rs if not specified) and will never activate both features at once. IMO those are the only supported ways to build rustls-ffi and so we don't support builds w/ both providers.
I did that on purpose because I couldn't come up with a use-case where you would want both crypto providers built in, and it makes life harder for consuming applications. When both features are enabled the various builder fns that rely on a default will error unless a process-wide default has been explicitly set. If only one of the two features are enabled, the process is implicit and doesn't require any updates in the consumer code. As a concrete example of this the WIP curl diff I staged (diff) requires minimal changes under this model. Notably it doesn't need new code to install either rustls_aws_lc_rs_crypto_provider() or rustls_ring_crypto_provider() as the default.
WDYT? Do you think I should revisit this aspect of the design?
Ideally the version string should say which crypto providers the package was built against, though that can be a followup.
Great idea. I included an additional commit for this near the end of the series.
IMO those are the only supported ways to build rustls-ffi and so we don't support builds w/ both providers.
Maybe this is an oversimplification w.rt. cargo-c builds and it is worth having coverage? :thinking:
we don't support builds w/ both providers.
That's great, I support that. This makes me realize we need an update to the README discussing building with ring. That could be where we specifically document that building with both is unsupported.
@ctz Can I bug you for a second review? I'm out today but can finish up the changelog, README update, and other misc bits on ~Thurs if this looks good to everyone.
This makes me realize we need an update to the README discussing building with ring. That could be where we specifically document that building with both is unsupported.
Added a README update to discuss the crypto provider support, choosing one, and to mention both at once is unsupported.
update Changelog.md to describe breaking changes (note: waiting on review feedback in case there are changes).
Added extensive coverage of the added/breaking changes/removed updates to the CHANGELOG.md.
The experience made me think that it would be nice to offer a simplified certified key construction and supported ciphersuite list construction that assumes the default provider.
I reworked the rustls_signing_key update so that:
rustls_certified_key_build()retains theprivate_keyandprivate_key_lenarguments and uses the default crypto provider to create arustls_signing_keyto use for the certified key (or will error if not set).- a new
rustls_certified_key_build_with_signing_key()fn is added that allows providing arustls_signing_keyconstructed with arustls_crypto_providerin place of theprivate_keyandprivate_key_lenarguments ofrustls_certified_key_build(). The latter uses this function internally.
I also added a new rustls_default_supported_ciphersuites() fn to return the rustls_supported_ciphersuites of the default crypto provider (or an error if not set).
I think these two changes will make life easier for the downstream users that just want to work with the defaults and not handle a rustls_crypto_provider explicitly.
https://github.com/rustls/rustls-ffi/issues/422 - "Make rustls_client_config_builder_build fallible, rm NoneVerifier"
Lastly, I remembered this issue I had filed as a breaking change and rolled it into this branch. The crypto provider work already made rustls_client_config_builder_build fallible so it was straight forward to remove the NoneVerifier. A handful of tests needed to be updated to explicitly set a provider to avoid the new RUSTLS_RESULT_NO_SERVER_CERT_VERIFIER error returned by the builder operation when no verifier is set.
@jsha Did you want to give any of these updates another pass or is this safe to merge/release?
@jsha @ctz Thank you both for the reviews! Hopefully this is the last big API update for a while now that we have more parity between this repo and the Rustls 0.21+ crypto provider API.
cpu marked this pull request as draft 6 minutes ago
I think I need a couple more tweaks here (and test coverage) for consumers that want to use the implicit default provider. I'm seeing a failure mode downstream in curl where building a server cert verifier fails unexpectedly with this branch. I think the issue is that the verifier is built before a client config. When setting up the rustls-ffi verifier builder we query the process default provider, but it's None at this point because no configs have been built to have the Rustls-internal get_default_or_install_from_crate_features() do the required magic to install the default implied by unambiguous feature selection. When we try to build the verifier, we find there's no default provider and err.
Working on a fix.
Working on a fix.
The core issue here is that Rustls only sets the process default based on feature selection when calling ClientConfig::builder(), ServerConfig::builder(), WebPkiClientVerifier::builder() or WebPkiServerVerifier::builder().
That makes offering an API like the rustls_default_supported_ciphersuites() function in this patch set challenging. We want this function to act like upstream Rustls and work without having to explicitly register a default provider if we can install one on behalf of the user unambiguously based on crate features. However, since CryptoProvider::get_default_or_install_from_crate_features() is private we can't. We don't want to error and make the user explicitly install a default before calling it since there's an obvious default to use. We could add a hacky workaround like constructing a throw away ClientConfig::builder() and taking advantage of the side-effect it has but that's pretty gross :-/
A similar problem happens w.r.t loading certified keys. If you try to do this with the default provider workflow and haven't built a client or server config builder yet there isn't a process default provider and we can't set one from rustls-ffi based on the implied unambiguous default.
I've worked around the pain this causes in some places by changing where the builders are instantiated in a local WIP branch but I think to offer the best API we should make a case for Rustls exporting CryptoProvider::get_default_or_install_from_crate_features(). I'll see about pulling out a pitch for this in the coming days.
I was able to get curl building & passing its rustls test suite with my hacky branch but I think we have the time to get this right before pushing out a 0.14 release with an awkward API update.
I've worked around the pain this causes in some places by changing where the builders are instantiated in a local WIP branch but I think to offer the best API we should make a case for Rustls exporting CryptoProvider::get_default_or_install_from_crate_features(). I'll see about pulling out a pitch for this in the coming days.
It occurs to me as an intermediate step we can reproduce the logic local to rustls-ffi. I'm going to experiment with that.
It occurs to me as an intermediate step we can reproduce the logic local to rustls-ffi. I'm going to experiment with that.
I think this works well and I'm happy with the shape of things again :face_exhaling:
Summary of changes:
- I moved dropping the
NoneVerifierearlier in the diff since its less interesting - We get our own new
rustls-ffiinternalget_default_or_install_from_crate_features()fn incrypto_provider.rs - The functions where we were previously using
CryptoProvider::get_default()internal to various builder constructors are switched to use this function. - I dropped all of the
ensure_default()bits in the Rust unit tests - the implied process-wide default is used and we exercise the tests with bothCRYPTO_PROVIDER=aws_lc_rsandCRYPTO_PROVIDER=ring. - The
client.candserver.cdiffs are much smaller and no longer do as much involvedrustls_crypto_providerwork, instead using the process default provider.
I rebased my wip curl branch and its Rustls test suite is now passing. The actual API update is nice and small.
Still working on the Apache HTTPD mod-tls update. I'll keep this in draft until that's squared away.
@jsha Did you want to give any of these updates another pass or is this safe to merge/release?
Please go ahead without waiting for further reviews from me - thanks!
stage a downstream update in mod-tls
I've been working on this more (wip branch) and have all the mod_tls tests passing save one, test/modules/tls/test_10_session_id.py::TestSessionID::test_tls_10_session_id_12 (example failure). I'm not sure what's up with this one, I wouldn't have expected any changes w.r.t session resumption. I will dig deeper.
I think I also need to revisit some of my httpd diff. In my current version I updated tls_proto_init to collect up the apr_hash_set of rustls_supported_ciphersuite instances from the default crypto provider, but I think the lifetime of the rustls_supported_ciphersuite pointers are tied to the rustls_supported_ciphersuites object.
In our server.c/client.c examples this is fine because we peel out a rustls_supported_ciphersuite from the rustls_supported_ciphersuites and construct a crypto_provider_builder and crypto_provider with the rustls_supported_ciphersuite before calling rustls_supported_ciphersuites_free (relevant helper code). In the HTTPD arrangement the ciphersuites are stored in the hashset but not used to construct anything until later-on in tls_core.c's init_outgoing_connection and init_incoming functions. That means we can't free the rustls_supported_ciphersuites at the end of tls_proto_init or those pointers are dangling. To unblock progress I added a very hacky commit to leak the struct since otherwise segfaults ensued :skull:
I'm not sure the best architectural fix for this wrinkle yet. Perhaps the tls_proto_conf_t struct should be changed to hold onto the rustls_supported_ciphersuites directly? I think that struct is a singleton that lives as long as the server and isn't freed (?). Alternatively should we think of a way to make this easier in rustls-ffi? The ciphersuite impls are 'static so in theory this shouldn't be so painful.
I'm also not sure if either the test_10_session_id.py test failure, or the HTTPD supported ciphersuites architectural issue mentioned above should be considered blockers for this release. My intuition right now is that the answer is yes, and that fixing both is the best way to get confidence this release is ready. Divergent opinions welcome.
I'm not sure the best architectural fix for this wrinkle yet. Perhaps the tls_proto_conf_t struct should be changed to hold onto the rustls_supported_ciphersuites directly?
I think this turns out to be a crummy direction to pursue because the tls_proto_conf_t struct only holds members allocated from an APR memory pool that (I believe) is freed based on pool-level semantics. It'd be tricky to arrange rustls_supported_ciphersuites_free() to be called for a new rustls_supported_ciphersuites member.
Alternatively should we think of a way to make this easier in rustls-ffi? The ciphersuite impls are 'static so in theory this shouldn't be so painful.
This experience pushed me into realizing the pointers we give to supported ciphersuites need to be associated with the lifetime of the crypto provider they came from (default or otherwise). This also alleviates the need for the rustls_supported_ciphersuites type, which was awkwardly trying to have a lifetime separate from the rustls_crypto_provider by cloning the CryptoProvider.cipher_suites Vec.
For the HTTPD use-case where you're working with the default crypto provider the lifetime ends up being equivalent to 'static because the Arc<CryptoProvider> held by Rustls for the process-wide default won't be freed for the lifetime of the process. For the custom provider use-case you'll need to make sure the pointers to supported ciphersuites you get from the provider don't outlast the provider, but that is reasonable (and avoids having to manage the lifetime of a separate rustls_supported_ciphersuites collection).
I've reworked this aspect of the changeset and updated the changelog accordingly.
I've been working on this more (wip branch) and have all the mod_tls tests passing save one, test/modules/tls/test_10_session_id.py::TestSessionID::test_tls_10_session_id_12 (example failure).
I'm still pretty stumped by this. I was able to verify that the rustls-ffi server.c example works as the test expects with openssl s_client -reconnect -tls_12 seeing only the same Session-ID across the 6 negotiations (both with the "default" server config, and with a customized one using RUSTLS_CIPHERSUITE to customize down to one supported ciphersuite). This suggests to me the issue isn't with rustls-ffi but with the httpd diff. However, I'm also seeing the same failure mode with my local httpd build of trunk using rustls-ffi 0.13.0 (???). Very bizarre since it's passing on trunk in CI. I think I'm missing something...
I think I'm missing something...
As expected, I was missing something :laughing: I see the expected output w/ httpd@trunk and [email protected] once I updated my config to include the right mod:
[Wed Aug 28 21:10:13.939911 2024] [tls:warn] [pid 1385824:tid 1385824] AH10348: session cache [shmcb:mod_tls-sesss(64000)] could not be initialized, will continue without session one. Since this will impact performance, consider making use of the 'TLSSessionCache' directive. The error was: cache type 'shmcb' not supported (known names: ). Maybe you need to load the appropriate socache module (mod_socache_shmcb?).
I'm still not sure what's causing the divergence with my WIP branch and 0.14.0 :thinking:
Cargo: version 0.13.0 -> 0.14.0-rc1
I switched the version number from 0.14.0 to 0.14.0-rc1. Despite a bunch of debugging time I still haven't been able to get to the bottom of the HTTPD mod_tls TLS 1.2 resumption issue. Ctz agreed to take a look to see if he can spot what I'm missing. So far my debugging suggests the issue is in my HTTPD patch and not the rustls-ffi code, so I think merging this as-is and calling it a release candidate will make getting help easier. If Ctz comes up empty I'll beg/plead/bribe Icing@Apache to take a peek. When we get to the bottom of the problem we can cut a 0.14 with full confidence our two major downstream projects will be able to update w/o issue.
39 successful and 11 expected checks
Admin merging so I can fixup the required check names afterwards.
so I can fixup the required check names afterwards.
Done.
Ctz agreed to take a look to see if he can spot what I'm missing
Ctz to the rescue :superhero: :trophy: It turns out to be the combination of a pre-existing mod_tls bug (a cache session callback returning the wrong len for the data written to the buff) and a defensive change in rustls 0.23.11 (a change that made a codec implementation detect trailing data). I will fix the mod_tls bug alongside the update PR and we're clear to cut a final release of rustls-ffi 0.14. :tada: