seastar icon indicating copy to clipboard operation
seastar copied to clipboard

tls: Introduce OpenSSL

Open michael-redpanda opened this issue 11 months ago • 22 comments

Introduces OpenSSL as an alternative TLS implementation to GnuTLS. This is a build-time configuration controlled by the CMake variable Seastar_USE_OPENSSL. The configure.py script has been updated to now have a --crypto-provider option. Valid arguments to that are OpenSSL and GnuTLS.

This implementation was released in Redpanda v24.2 on July 31st, and has been running on production clusters since.

Redpanda implemented these changes in order to provide a FIPS-compliant build to customers that require it (such as those wishing to undergo FedRAMP evaluation). OpenSSL was selected as it allows implementors to maintain the validation of the cryptographic module even when it's built from source.

No changes have been introduced to enable the FIPS provider for Seastar. It is up to the implementor to enable and use the FIPS cryptographic module if desired.

Fixes: #698

michael-redpanda avatar Dec 09 '24 13:12 michael-redpanda

@elcallio please review

Are there any functional differences?

Is hot reload of certificates supported?

Should we support gnutls and openssl in parallel?

avikivity avatar Dec 09 '24 14:12 avikivity

Are there any functional differences?

The only major difference between the OpenSSL vs GnuTLS implementation is how the implementation is configured. With OpenSSL, there are 5 new methods that can be used to control it (compared to the single set_priority_string with GnuTLS):

  • set_cipher_string - used for controlling which ciphers are available in TLS1.2 and below (OpenSSL SSL_CTX_set_cihper_list)
  • set_ciphersuites - used for controlling which cipher suites are available in TLS1.3 (same doc page as above)
  • enable_server_precedence
  • set_minimum_tls_version - defaults to 1.0
  • set_maximum_tls_version - defaults to 1.3

There may be some other subtler differences, such as OpenSSL may be stricter about certificate contents (e.g. see https://github.com/scylladb/seastar/pull/2569/commits/1744b664e18e0cbdf5f25d87e4bca592f7a2c226 - this required the CA cert to have CA:True)

Is hot reload of certificates supported?

Yes

michael-redpanda avatar Dec 09 '24 14:12 michael-redpanda

The different configuration options mean one of two:

  • side-by-side support, at least for a transitional period
  • emulate priority strings for openssl (far from the focus of Seastar)

So I think we should choose one implementation, and deprecate the other after a transition period.

gnutls was chosen due to funky licensing and an unstable API from openssl, but I think that's behind us now. Given that, it's better to use the leader in the field rather than a follower.

avikivity avatar Dec 09 '24 15:12 avikivity

GnuTLS is not FIPS enabled, if compiled with it? (I see https://www.gnutls.org/manual/html_node/FIPS140_002d2-mode.html ) - what's missing?

mykaul avatar Dec 09 '24 16:12 mykaul

GnuTLS is not FIPS enabled, if compiled with it? (I see https://www.gnutls.org/manual/html_node/FIPS140_002d2-mode.html ) - what's missing?

The FIPS flag for GnuTLS means that GnuTLS will work in a FIPS compliant way (e.g. rejecting any non FIPS approved crypto like DES or GOST), however that doesn't mean that its implementation was validated. GnuTLS doesn't provide a path to build it and maintain validation, OpenSSL does (see above links to OpenSSL's security policy).

michael-redpanda avatar Dec 09 '24 20:12 michael-redpanda

gnutls was chosen due to funky licensing and an unstable API from openssl, but I think that's behind us now. Given that, it's better to use the leader in the field rather than a follower.

While I can somewhat sympathize with the sentiment, this would effectively mean dropping the way we handle TLS config for clients/applications (i.e. usage of the prio string). Since at least one database application I know of exposes this in its customer config, you'd effectively be asking to require changing all customer TLS configs where such a prio is applied. Not sure how many they are and how complicated the configs are. You said a transition period, but not sure how to handle this, nor enforce a config migration with clients?

elcallio avatar Dec 10 '24 11:12 elcallio

gnutls was chosen due to funky licensing and an unstable API from openssl, but I think that's behind us now. Given that, it's better to use the leader in the field rather than a follower.

While I can somewhat sympathize with the sentiment, this would effectively mean dropping the way we handle TLS config for clients/applications (i.e. usage of the prio string). Since at least one database application I know of exposes this in its customer config, you'd effectively be asking to require changing all customer TLS configs where such a prio is applied. Not sure how many they are and how complicated the configs are. You said a transition period, but not sure how to handle this, nor enforce a config migration with clients?

I would not want to force a config migration.

Claude says this:

The gnutls-cli tool includes a "--priority-debug" option that can show detailed information about a priority string's settings, but there isn't a direct tool to convert between GnuTLS priority strings and OpenSSL configuration formats.

You can analyze the GnuTLS priority string components with gnutls-cli --priority-debug "PRIORITY STRING" and then manually map those to equivalent OpenSSL configurations based on the cipher suites and protocols shown.

For more control, you could write a script that parses the gnutls-cli output and maps the components to OpenSSL's configuration syntax, though you'd need to account for the differences in how each library names and groups their ciphers and protocols.

avikivity avatar Dec 10 '24 12:12 avikivity

Yes, but I am honestly very nervous about writing/maintaining a prio string translator. The mapping is not just cipher to cipher etc, it is a state machine in itself, disabling and adding ciphers, exchange modes etc. As for using cli tool and manually map - that is what a customer would have to do if we changed the config approach.

elcallio avatar Dec 10 '24 12:12 elcallio

Yes, but I am honestly very nervous about writing/maintaining a prio string translator. The mapping is not just cipher to cipher etc, it is a state machine in itself, disabling and adding ciphers, exchange modes etc. As for using cli tool and manually map - that is what a customer would have to do if we changed the config approach.

I agree with that.

@tzach do you have any insight about priority string configuration across our fleet? Do we ever diverge from the default?

avikivity avatar Dec 10 '24 12:12 avikivity

Yes, but I am honestly very nervous about writing/maintaining a prio string translator. The mapping is not just cipher to cipher etc, it is a state machine in itself, disabling and adding ciphers, exchange modes etc. As for using cli tool and manually map - that is what a customer would have to do if we changed the config approach.

I agree with that.

@tzach do you have any insight about priority string configuration across our fleet? Do we ever diverge from the default?

Since we did not disable TLSv1.1 by default (not sure why), there's a good chance users do it - https://enterprise.docs.scylladb.com/stable/operating-scylla/security/client-node-encryption.html#priority-string-and-tlsv1-2-1-3-support

mykaul avatar Dec 10 '24 12:12 mykaul

Force push fc616ee:

  • Added compile definition SEASTAR_USE_GNUTLS for when GnuTLS is the cryptographic provder
  • Updated #ifndef SEASTAR_USE_OPENSSL to instead be #ifdef SEASTAR_USE_GNUTLS
  • Move implementation specific credential builder methods to the implementation specific compilation units
  • Removed tls_session_logger and added a pretty-print function for the OpenSSL implementation of seastar::tls::session
  • Dynamically setting minimum security level passed to SSL_CTX_set_security_level

michael-redpanda avatar Dec 12 '24 01:12 michael-redpanda

Force push 1a53361:

  • Added missing compile definition for SEASTAR_USE_GNUTLS when compiled to use modules
  • Added missing header include for fmt/ostream.hh when compiled to use modules

michael-redpanda avatar Dec 12 '24 13:12 michael-redpanda

Force push 2f935ca:

  • Addressed issues from comments
  • Created GnuTLS & OpenSSL implementations for SHA1-Base64 used by websocket
  • Fixed nits

michael-redpanda avatar Dec 20 '24 15:12 michael-redpanda

Force push db318da:

  • Addressed PR comments

michael-redpanda avatar Jan 15 '25 13:01 michael-redpanda

Force push 245d15c:

  • Rebased off master to fix merge conflicts introduced by https://github.com/scylladb/seastar/commit/6f39b89dbda812c7985918d3d8d6d7f7792b0619#diff-2f6778574276aef7b957daacf9904fc252fe1b44d45819bc89daafce9dac3972R1042

michael-redpanda avatar Jan 15 '25 14:01 michael-redpanda

Force push cab469f:

  • Fixed typo

michael-redpanda avatar Jan 15 '25 15:01 michael-redpanda

Force push 47dfb11:

  • Fixed logic of get_security_level
    • Returning anything >1 when using OpenSSL 3.0 will introduce key restrictions that may cause Seastar to be unable to connect with other endpoints
  • Addressed error in test_reload_certificates_with_only_shard0_notify

michael-redpanda avatar Jan 15 '25 17:01 michael-redpanda

Note test failures.

I'm conflicted here. On the one hand, the tls implementation has a user footprint in priority string or equivalent openssl config. On the other hand, that's a tiny difference and they're otherwise equivalent. OpenSSL would be my choice as the only supported library if there wasn't the user facing stuff.

Can we translate the priority string to openssl config? I asked Claude and it spewed a long python script, maybe that's good enough.

avikivity avatar Jan 22 '25 19:01 avikivity

Note test failures

It appears all failures are from Seastar.unit.rpc, which I don't believe interacts with the TLS. Were they flaky before?

Can we translate the priority string to openssl config? I asked Claude and it spewed a long python script, maybe that's good enough.

It definitely could be done by setting all of these within an OpenSSL config file. You'd just have to be careful to tell OpenSSL where to find it so it doesn't attempt to locate and use the system default one (unless that's desired).

michael-redpanda avatar Jan 22 '25 19:01 michael-redpanda

Can we translate the priority string to openssl config? I asked Claude and it spewed a long python script, maybe that's good enough.

I guess your question is "can I take the input toset_priority_string and turn that into the appropriate OpenSSL API calls"? It could be doable, but would take a while and I'm not sure if that's something anyone would really want to maintain.

michael-redpanda avatar Jan 22 '25 19:01 michael-redpanda

Can we translate the priority string to openssl config? I asked Claude and it spewed a long python script, maybe that's good enough.

I guess your question is "can I take the input toset_priority_string and turn that into the appropriate OpenSSL API calls"? It could be doable, but would take a while and I'm not sure if that's something anyone would really want to maintain.

Is there no equivalent textual configuration?

I wouldn't want to maintain it, but neither do I want to maintain two different tls implementations.

avikivity avatar Jan 22 '25 19:01 avikivity

Is there no equivalent textual configuration?

AFAIK only through the OpenSSL config file which can be loaded either automatically by OpenSSL when it is initialized or can be controlled by the user by calling OSSL_LIB_CTX_load_config. And within there would be number of key-value pairs that would need to be set correctly.

michael-redpanda avatar Jan 22 '25 19:01 michael-redpanda

Hi again @avikivity and @elcallio . Apologies for such a long delay in updating this PR. I've made some updates and rebased off of master.

michael-redpanda avatar Jun 24 '25 19:06 michael-redpanda

Hi again @avikivity and @elcallio . Was wondering if there was anything I could do to assist in regards to this PR?

michael-redpanda avatar Jul 21 '25 12:07 michael-redpanda

I prefer replacing gnutls with openssl rather than adding yet another option. But perhaps we can't due to those minor configuration issues.

Maybe we can start a long deprecation cycle during which users will be encouraged to update their configuration.

avikivity avatar Jul 21 '25 12:07 avikivity

If we indeed start a deprecation cycle, it will have to be run-time selectable (so users can update their configuration, change the selector, restart). I don't know how difficult that is.

avikivity avatar Jul 21 '25 12:07 avikivity

From an API standpoint, I guess replacing the prio strings with some sort of abstracted algorithm-selection is the biggest hurdle (std::set<tls_algo> allow, forbid?). Once users migrate this, the rest should be fairly transparent/equivalent.

elcallio avatar Jul 21 '25 12:07 elcallio

If we indeed start a deprecation cycle, it will have to be run-time selectable (so users can update their configuration, change the selector, restart). I don't know how difficult that is.

I believe this would require a significant change to the PR to support this. If this is what is desired, more than happy to look into it, but I would have to close this PR and open a new one.

michael-redpanda avatar Jul 31 '25 10:07 michael-redpanda

@avikivity I haven't looked too deep into it yet, but I think making it runtime selectable would be a change to the API

michael-redpanda avatar Jul 31 '25 10:07 michael-redpanda

@avikivity I haven't looked too deep into it yet, but I think making it runtime selectable would be a change to the API

Why? If the API suits both openssl and gnutl statically, it should suit them dynamically, no?

avikivity avatar Jul 31 '25 10:07 avikivity