[DNM] Add Rustls compile time implementation
Overview
The PR implements native Rustls usage within pingora-core. The feature can be activated using feature flag rustls. The changes involved restructuring/abstraction within the pingora-core to allow for multiple TLS implementations.
As of now there are no danger elements involved within the Rustls implementation.
The PR is intended to provide a basis for further discussions and enhancements. It is expected that further works will be needed. The current status has reached a fully working state and seems to provide good initial results.
https://github.com/cloudflare/pingora/issues/29
Notes
As I'm quite new to Rust I may have taken choices within the implementation that are not good, may have missed best practices or other things. The PR was created with the goal of further exploring the Rust language and to gather more knowledge on the pingora framework.
There are likely areas within the code that can and should be improved. I highly appreciate any feedback or discussions that help to improve the implementation.
Implementation
The implementation relies on the crates provided from the rustls project and utilizes tokio-rustls.
As of now the pingora-rustls crate is very slim and only contains helper functions and re-exports. tokio-rustls was directly integrated with pingora-core.
The feature flag rustls adds the pingora-rustls dependency to pingora-core and triggers compile time includes within pingora-core.
Significant API changes
pingora_core::connectors::tls::Connector&pingora_core::listeners::tls::Acceptorare now havingBox<dyn Trait>members instead of concrete types.- server
handshakerelated parameters & returns are now based onBox<dyn IO + Send> - Some structural elements have been renamed from a
sslform to atlsform (e.g.SslAcceptortoTLSAcceptor). - Stored certificates in non-implementation specific areas are now using DER format as
[u8]which is favorable to instantiating the according certs/keys from BoringSSL/OpenSSL and Rustls.
... and multiple others as OpenSSL/BoringSSL is currently tied into pingora-core at different areas.
Tests
The existing tests pass with BoringSSL, OpenSSL and Rustls. Some tests where slightly modified to avoid errors/warnings with the different SSL/TLS feature flags. Tests in related to the certificate callback and no-verify options are disabled when using the rustls feature.
The tests for Rustls need to run with a certificate that is valid for all requests as there is currently no option implemented to disable hostname verification.
Performance
As of now there is no in-depth comparison between the different SSL/TLS implementations available. Some quick initial tests where performed to check some basic metrics. None of these performance statements are reliable indicators as they were only measured on a local machine without checking further relevant details.
During the tests the used artifacts had been compiled using a Cargo profile that inherits = "release".
Areas that where checked on the local machine:
- TLS handshakes (v1.2) using tls-perf
- simple GET at a rates of 1500 req/sec using ali
- simple POST requests sending 8MB of data using ali
From these quick and basic tests it looks like that
- the OpenSSL implementation compared to current main branch v0.3.0 does not show significant differences.
- the Rustls implementation is providing improvements. The gap seems to be closest on the POST result comparison.
Thank you for very much for your attention and have a nice week. :grinning:
Kind regards, Harald
Further Thoughts & Considerations
... in unsorted order
-
layout/structure to allow generic TLS provider integration
- creation of a
pingora-core-apicrate to avoid cyclic dependencies- moving the TLS provider traits into that crate
- updating implementation in
pingora-coreTLS areas to use the traits frompingora-core-api - adding actual implementation using compile time feature configuration
- moving TLS provider specific code into TLS crates
- e.g. create
pingora-boringssl-opensslcrate to avoid code duplication for BoringSSL & OpenSSL - move BoringSSL & OpenSSL related code into that create
- re-exporting
pingora-boringssl-opensslimplementations via thepingora-opensslandpingora-boringsslcrates and use these to select the TLS provider during compile time - move Rustls specific code into
pingora-rustls
- e.g. create
- allowing potential other TLS backends to be implemented (e.g. tokio-tls-native, ...)
- creation of a
-
certificate/key storage and parsing
- within the PR the storage format is currently adapted to DER [u8] which might not be the best choice in terms of performance
- in Rustls this does as align well with the internal storage type beeing either
Vec<u8>or&[u8]in DER.
- in Rustls this does as align well with the internal storage type beeing either
- currently parts of the
pingora-core::utils::tlsare BoringSSL/OpenSSL specific implementations - within Rustls there is no X509 parsing available, and the project mentions using the x509-parser crate
- depending on the storage format, it might be helpful to unify the utils section and to add the x509-parser directly to
pingora-coreand use it unified for all TLS implementations
- within the PR the storage format is currently adapted to DER [u8] which might not be the best choice in terms of performance
-
testing
- are performance tests / benchmarks available for the TLS area to validate the impact changes?
- benchmarking TLS implementations within core (connector, acceptor, proxy from connector to acceptor)
- enhance test coverage for various Rustls specific implementations
-
extending the CI
- test various TLS implementations (e.g. matrix with `tls_feature: [openssl, boringssl, rustls], executing tests & benchmarks)
- run benchmarks and create comparing reports during PRs
-
identical behaviour to BoringSSL/OpenSSL in the area of
verify/sni/altvalidation -
Rustls Post Quantum support
This is incredible work! I (and the rest of the pingora team) are going to start taking a look at this now and start putting in feedback as it comes up. My first reaction (other than "Wow!") is this is a giant pr, so the first thing I'm going to try to do is figure out a way to break it up into smaller pieces. This makes it easier to review, but also since we use pingora as the foundation for multiple services in production, we try not to let it change too much too fast π
Here is the order of operations I recommend.
- Leave this pr open as a reference, but add
[DNM](do not merge) to the title. - Wait for the public sync today (or next week at the latest) - @palant has a pr for no-op tls that should get synced to the public repo today, and that might cause some conflicts with this pr.
- Add separate, stacked prs with this (suggested) break down
- Structural refactoring - Just the minimum changes to establish the new
boringssl_opensslmodule and add therustlsfeature (minimal code changes). This will eliminate a lot of the noise from the pr. - Certificate changes - Changes to make the existing boringssl_openssl implementations, examples and tests work with the modified
CertKeytype. To me this is the change with the biggest risk since it will affect the tls code we are using in production. - Introduce
pingora-rustls- Add this crate to the workspace and wire it in with the features - Completed Implementation - Everything else.
- Structural refactoring - Just the minimum changes to establish the new
I want to reiterate, that this is an enormous amount of work, and an amazing undertaking for one person π . rustls integration is something we have been planning to do for a while. To that end, if the above steps seem too tedious or onerous, I will totally understand. We will gladly push this over the finish line with the work you have already done (and make sure you are credited for your contributions). Obviously we would rather keep working with you directly, but you should know that you have already helped us a lot, so thank for you for this contribution (and hopefully lots more in the future).
PS. If you are new to rust it doesn't show π
Your Further Thoughts comment has some good points on how the tls and api code can be reorganized (among other things). Now that we are introducing rustls and no-op tls to the code base, it does make sense to formalize and isolate those features in their own crate. The big thing we want to ensure is that at the pingora-sever level, we aren't adding any dynamic or enum dispatches to just to account for extra tls-provider options. As @eaufavor mentioned here, we will internally be using open/boring ssl internally for compliance reasons for (basically) ever. We also don't see a need to support multiple providers at runtime, so we want to make sure that the provider selected via compile-time features is accessible directly.
You also mentioned testing and feature matrices within the ci. Those are good callouts, and part of the changes I'm working on to go with the pr for no-op tls.
Interesting approach. I was also thinking about this, and I thought about standardizing on the Rustls API rather than introducing separate code paths for Rustls and OpenSSL/BoringSSL. The reason is that Rustls has a better API (at least for Rust consumers), and it supports different backends. A BoringSSL backend already exists, and writing an OpenSSL backend providing only the necessary functionality should be doable.
But I guess that this kind of standardization can still be done later once the Rustls codepath lands β if itβs considered desirable.
Hi @johnhurt,
thank you very much for your kind words and the feedback. :smiley:
I'd be interested in continuing working on this change, and fully support the suggested approach to split that PR into pieces. I'm currently travelling and will be back in around 2-3 weeks. In case someone wants to work meanwhile on the topic I'd be as well happy to support any efforts that help to bring this feature forward.
I've renamed the PR to have the [DNM] prefix.
It is not fully clear to me which path is desired for the suggested stacked PRs:
- Should the current PR be split and be continue to be part of
pingora-core? - Should the current PR be further adapted to the separate crates approach (introduce the
pingora-core-api, have the implementations within e.g.pingora-boringssl-openssl,pingora-rustls?
For splitting up the PR into the stacked ones it would be as well helpful to have an initial PR review on the code side to allow the first set of feedback to be directly incorporated with these PRs. I know, this request is unfortunate as one of the goals of stacking the PRs is to simplify reviewing.
From a performance, dependency, distribution point of view I think it is desirable to stay with compile time TLS implementation selection. Fully avoiding dynamic dispatching could eventually work out, but the current approach does contain trait objects as struct members.
I'm thinking it might be worthwhile to start with a benchmarking PR for the acceptor/connector components to have meaningful way for comparisons. The tooling here could be valgrind based (e.g. callgrind/cachegrind/dhat). A time based benchmark test for the full acceptor/connector will likely not produce stable figures due to the complexity and async runtime. What do you think of trying iai_callgrind for that purpose?
Enjoy the summer and have a great time. :sun_with_face:
Kind regards, Harald
Awesome, @hargut. I marked this as accepted. I'll split it up into smaller PRs internally and you will see them come out slowly in our weekly syncs. I am going to lean towards keeping everything in the pingora-core as opposed to adding new crates. I think we might go that way in the future, but for development, this is a little easier ... plus crates.io already complains enough about the number of crates we publish.
I want to thank you again for this giant piece of work and effort. I'll make sure you are the author on the commits I make that are derived from your code. If you are willing to keep contributing then, a benchmarking framework would be a great place to start. That is something we don't have for pingora, and there are times (like now) when comparative benchmarks would be helpful. I am partial to criterion, but I have never used it outside of simple tests. Thanks,
Kevin
Hi @johnhurt ,
thank you for looking into splitting up the PRs in to smaller ones and taking care of the process.
As one of my goals with this PR is learning Rust and related details, I'd highly appreciate to have a chance to receive the technical feedback to allow learning from mistakes having been made. Can you please somehow (also via mail if easier) provide the feedback received?
I've created a separate ticket for the benchmark topic and will be looking into it.
Kind regards, Harald
Totally, @hargut. I'll update this ticket whenever we review/release part of this original cr to update you on the progress and what we find. Internally we are reviewing the changes to the existing openssl-boringssl implementations. We are going to spend the most time with this because this is only code path we are going to use internally, and we want to make sure it isn't affected by including a rustls implementation.
The only thing I have right now in my notes from reading through to deciding where to split is that I that the listeners::tls::SSLAcceptorBuilder which in this pr is a Box<dyn Trait> could probably be replaced with a #[cfg(feature)] to allow it to work with all tls implementations, but still static dispatch.
I like the new issue you put in, and I appreciate all your contribution. Thanks!
Hi @johnhurt,
FYI: I've pushed a simplification commit within this branch removing several Boxes from method signatures.
Kind regards, Harald
Awesome. Thank. I'm the one traveling this time, so I'll take a look at it when I get home
Hi @johnhurt,
I'm currently having a closer look at the differences in the BoringSSL/OpenSSL implementations comparing pingora/0.3.0 with this branch (#336 ) using #367.
The first performance related enhancement was removing the indirection through the trait in protocols::tls::TlsStream.tls related calls. The indirection was present in the call graph, and is gone after applying the change. This change also had positive effects when comparing the result stats.
Have a great weekend. :smiley:
Kind regards, Harald
Hey, @hargut. I am almost done with an internal recreation of your pr. I wanted to give you an update and save some of your time because it looks like (based on your most recent prs) we came to the same conclusions.
My main goal is to get rustls support added in without making any (meaningful) changes to the existing ssl implementations. What I did was take all the trait-level abstractions you (used to) have, and replace them with compile-time configuration switching. The result is our ssl implementations are unchanged (other than some type aliasing), and the rustls implementation loses the need for any indirection other than what it takes to make its api look like what we are already using to talk to openssl.
I'll let you know when this is about to be synced so you can get prepared for the outpouring of appreciation from all the people waiting for specifically this feature. I think you can consider this pr complete. Thanks!
Hi @johnhurt,
thank you for the feedback and your help in landing this feature! :+1:
The past commits from my end, had been focused on getting the benchmark stats as close as possible to the original implementation. As of now, I feel that the PR could have been less complicated in regards to layout and abstractions. As you already mentioned using a more direct approach with compile time configs is favourable in regards to performance.
Likely I'll add another commit to "finish" the intended work from my end. But I solely consider this efforts as educational efforts for myself and to get familiar with the tooling and (more) performance details.
Kind regards, Harald :smiley:
This has been merged! Thanks for your contributions π€
Hi @johnhurt,
thank you for all your help in this matter. :+1: :partying_face:
Kind regards, Harald