mbedtls icon indicating copy to clipboard operation
mbedtls copied to clipboard

Implement TLS-Exporter

Open mfil opened this issue 1 year ago • 12 comments

Description

This pull request implements the TLS-Exporter feature as defined in RFC 8446, Section 7.5 and RFC 5705.

TLS-Exporter allows the client and server to extract additional shared symmetric keys from the SSL context by inputting a label and a desired length for the key.

Currently, it is possible for library users to implement TLS-Exporter in TLS 1.2 by using mbedtls_ssl_set_export_keys_cb() to obtain the master secret and then calculate mbedtls_ssl_tls_prf(). It is not currently possible to do this for TLS 1.3. This pull request adds the function mbedtls_ssl_export_keying_material() to implement TLS-Exporter in the library for both TLS 1.2 and 1.3.

~~I have marked this pull request as "Draft" because I have not yet added any automated tests, and I would like some help in figuring out what would be the best way to add them. I have added options to ssl_client2 and ssl_server2 to print out the derived symmetric keys on the command line. I have checked that when connecting openssl s_client (with the -keymatexport option) to ssl_server2, they both produce the same key.~~

I have added a test for the TLS 1.3 Exporter. I could not find test vectors online, so I have taken the "exp master" key from RFC 8448 and used an online HMAC-SHA256 calculator to calculate the expected result. Additionally, I have added options to ssl_client2 and ssl_server2 to print out the derived symmetric keys on the command line. I have checked that when connecting openssl s_client (with the -keymatexport option) to ssl_server2, they both export the same key.

PR checklist

Please remove the segment/s on either side of the | symbol as appropriate, and add any relevant link/s to the end of the line. If the provided content is part of the present PR remove the # symbol.

  • [x] changelog provided
  • [x] development PR provided
  • [x] framework PR not required
  • [x] 3.6 PR provided #9469
  • 2.28 PR not required because: TLS 1.3 is only experimental in this version, and the user can implement TLS-Exporter in TLS 1.2
  • [x] tests provided

mfil avatar Jul 23 '24 12:07 mfil

Sorry for the force-push. I forgot to sign off my latest commit, and there's no way to fix that with more commits.

I can close and reopen this merge request, but first I'd like to ask for some help with implementing unit tests.

mfil avatar Jul 25 '24 14:07 mfil

@mfil Thanks for this very succinct and well-documented PR. @waleed-elmelegy-arm and I plan to do reviews over the next couple of months so we can try to get this merged.

In the meantime, there are some merge conflicts (I think from the 3.6 release). Would you be able to resolve these? We aren't planning much work in this area of the code for a while, so it will be unlikely to bitrot again.

davidhorstmann-arm avatar Sep 18 '24 16:09 davidhorstmann-arm

@mfil Thanks for this very succinct and well-documented PR. @waleed-elmelegy-arm and I plan to do reviews over the next couple of months so we can try to get this merged.

In the meantime, there are some merge conflicts (I think from the 3.6 release). Would you be able to resolve these? We aren't planning much work in this area of the code for a while, so it will be unlikely to bitrot again.

I've rebased onto develop, fixed some mistakes in the comments, and force-pushed. (Let me know if you prefer opening a new pull request.)

I'll wait for reviews before changing the 3.6 backport pull request, ok?

mfil avatar Sep 20 '24 14:09 mfil

I'll wait for reviews before changing the 3.6 backport pull request, ok?

That sounds fine to me, thanks! I'll start the CI again, I seem to remember it failed the last time with a build error in some configurations. If you need a hand reproducing it let me know.

davidhorstmann-arm avatar Sep 20 '24 14:09 davidhorstmann-arm

I'll wait for reviews before changing the 3.6 backport pull request, ok?

That sounds fine to me, thanks! I'll start the CI again, I seem to remember it failed the last time with a build error in some configurations. If you need a hand reproducing it let me know.

Most of them make sense to me.

In all_u16-check_names, it complains that mbedtls_ssl_export_keying_material isn't declared in a header, but it's right there in include/mbedtls/ssl.h, so I don't understand that.

mfil avatar Sep 20 '24 16:09 mfil

@davidhorstmann-arm I think I fixed everything. Can you re-run the CI?

mfil avatar Sep 21 '24 09:09 mfil

I've started a CI run.

gilles-peskine-arm avatar Sep 21 '24 18:09 gilles-peskine-arm

I've started a CI run.

Thank you!

I'm stuck on getting the last tests to succeed. The TLS 1.2 Exporter needs client_random and server_random. If I'm seeing this correctly, if MBEDTLS_SSL_CONTEXT_SERIALIZATION is not defined, then these go away when the handshake is done.

Would it be a problem to always have a randbytes field in struct mbedtls_ssl_transform?

mfil avatar Sep 22 '24 08:09 mfil

I've started the CI on the updated PR

tom-cosgrove-arm avatar Sep 22 '24 09:09 tom-cosgrove-arm

Ok, the code style check is happy now, that still leaves my problem I mentioned above.

mfil avatar Sep 22 '24 10:09 mfil

I've fixed the merge conflict and added a commit that should fix the remaining CI failures. Please re-run the CI @davidhorstmann-arm @gilles-peskine-arm

mfil avatar Oct 18 '24 14:10 mfil

Hooray!

mfil avatar Oct 18 '24 16:10 mfil

@davidhorstmann-arm Thank you for the feedback! I could have used fewer magic numbers.

I think I addressed all the comments on the code. I will add some negative tests later.

mfil avatar Oct 23 '24 16:10 mfil

I think I addressed all the comments on the code. I will add some negative tests later.

Excellent, thanks! I will take a look tomorrow and hopefully have an answer re the best error code and new config option.

davidhorstmann-arm avatar Oct 23 '24 17:10 davidhorstmann-arm

(CI failures are infra-related/spurious)

davidhorstmann-arm avatar Oct 28 '24 18:10 davidhorstmann-arm

@davidhorstmann-arm I added some tests, mostly negative tests but I also added tests to check that the client and server export the same keys.

One of them is currently failing: TLS 1.3 Keying Material Exporter: Consistent results, large keys

The problem is coming from lines 165-168 in mbedtls_ssl_tls13_hkdf_expand_label() in ssl_tls13_keys.c.

    if (buf_len > MBEDTLS_SSL_TLS1_3_KEY_SCHEDULE_MAX_EXPANSION_LEN) {
        /* Should not happen, as above. */
        return MBEDTLS_ERR_SSL_INTERNAL_ERROR;
    }

OpenVPN wants to export 256 bytes in one go, so this is a problem. I'll check if I can make this function work with larger key sizes.

mfil avatar Oct 29 '24 16:10 mfil

OpenVPN wants to export 256 bytes in one go, so this is a problem. I'll check if I can make this function work with larger key sizes.

The limitation to 255 bytes seems to date back to #3568. It looks like it's just a case of changing ssl_tls13_hkdf_encode_label() to properly encode multi-byte lengths in the HkdfLabel struct, but it's remotely possible that there is some other reason for this limitation. I'll ask one of the reviewers of that PR if they have any idea.

davidhorstmann-arm avatar Oct 29 '24 17:10 davidhorstmann-arm

I fixed the problem with mbedtls_ssl_tls13_hkdf_expand_label(). Also, I just realized that the HKDF-Expand algorithm produces at most 255 * hash_size bytes of output, so I changed the input validation in mbedtls_ssl_tls13_export_keying_material() to reflect that.

The tests are working on my machine now, I'll look into test cases for the example programs.

mfil avatar Oct 29 '24 18:10 mfil

I will kick off the CI and take a look at the testcases probably some time next week.

For help with the example programs testcases, have a look at tests/ssl-opt.sh, especially this documentation comment and also see #9638 as a recent example of how to add new testcases for sample programs.

davidhorstmann-arm avatar Oct 29 '24 18:10 davidhorstmann-arm

@davidhorstmann-arm I tried to fix the CI failures, please run the tests again.

mfil avatar Oct 29 '24 23:10 mfil

I don't know what these errors mean:

On Windows-2013-Release-Win32-cmake (and others): "A volatile slot has not been closed properly"

On all_u16-build_arm_linux_gnueabi_gcc_arm5vte and others:

[2024-10-30T13:33:58.436Z] suites/test_suite_ssl.function: In function 'test_ssl_tls_exporter_rejects_bad_parameters':
[2024-10-30T13:33:58.436Z] suites/test_suite_ssl.function:5277:5: error: 'context' may be used uninitialized in this function [-Werror=maybe-uninitialized]
[2024-10-30T13:33:58.436Z]      mbedtls_free(context);
[2024-10-30T13:33:58.436Z]      ^
[2024-10-30T13:33:58.436Z] suites/test_suite_ssl.function:5276:5: error: 'label' may be used uninitialized in this function [-Werror=maybe-uninitialized]
[2024-10-30T13:33:58.436Z]      mbedtls_free(label);
[2024-10-30T13:33:58.436Z]      ^
[2024-10-30T13:33:58.436Z] suites/test_suite_ssl.function:5275:5: error: 'key_buffer' may be used uninitialized in this function [-Werror=maybe-uninitialized]
[2024-10-30T13:33:58.436Z]      mbedtls_free(key_buffer);
[2024-10-30T13:33:58.436Z]      ^

I don't know what it wants from me. The pointers are initialized to NULL. After the allocation, I memset the content to 0. (Looking at TEST_CALLOC, this should be redundant anyway.) What's uninitialized here?

On all_u22-test_clang_latest_opt and others: Too much to paste here, it's having some complaint about MD_OR_USE_PSA_INIT()?

Also, how can I reproduce these errors on my machine?

mfil avatar Oct 30 '24 15:10 mfil

I think I fixed the second point in my previous comment. I used the macro MD_OR_USE_PSA_INIT(), which may jump to the exit label, before declaring and initializing the variables. I don't know if that fixes the other errors too.

mfil avatar Oct 30 '24 18:10 mfil

"A volatile slot has not been closed properly"

A volatile slot is a key created through the PSA API, excluding persistent keys. This error occurs when a test creates a key but does not free it before calling PSA_DONE(). It would typically not happen only on Windows. Maybe it would happen elsewhere but it was hidden by the other errors?

The other errors you mention have a common cause: macros like PSA_INIT, TEST_ASSERT and so on jump to the exit label in case of failure. So every variable that's used in the cleanup code needs to be initialized before the first call to one of these macros. Your fix is right, let's see what the CI says.

Jobs using armcc are currently broken on OpenCI (but expected to pass on Internal CI). Please ignore these failures while we work on it.

gilles-peskine-arm avatar Oct 30 '24 18:10 gilles-peskine-arm

A volatile slot is a key created through the PSA API, excluding persistent keys. This error occurs when a test creates a key but does not free it before calling PSA_DONE().

Could this be because I'm calling MD_OR_USE_PSA_DONE() before freeing the endpoints?

It's happening on non-Windows systems too, but not on all. How can I reproduce these errors on my machine?

mfil avatar Oct 30 '24 20:10 mfil

We have a brand new CI failure FAQ. At a glance, all configurations with both MBEDTLS_USE_PSA_CRYPTO and TLS exporter enabled are failing because there's a key that isn't destroyed when the key store is shut down. So you can reproduce that by configuring with scripts/config.py set MBEDTLS_USE_PSA_CRYPTO or scripts/config.py full before building.

Could this be because I'm calling MD_OR_USE_PSA_DONE() before freeing the endpoints?

Well, yes, that would explain it. PSA_DONE (and variants) can't be called while a key is live, and the TLS session uses keys. Typically PSA_DONE is only called during cleanup.

gilles-peskine-arm avatar Oct 31 '24 14:10 gilles-peskine-arm

I reproduced the error by enabling MBEDTLS_USE_PSA_CRYPTO and fixed it.

That uncovered a new error: In the test TLS 1.2 Keying Material Exporter: Consistent results, large keys, I try to export keys of length UINT16_MAX. This works when MBEDTLS_USE_PSA_CRYPTO is disabled.

When it's enabled, it fails at psa_key_derivation_tls12_prf_generate_next_block in psa_crypto.c:

    /* We can't be wanting more output after block 0xff, otherwise
     * the capacity check in psa_key_derivation_output_bytes() would have
     * prevented this call. It could happen only if the operation
     * object was corrupted or if this function is called directly
     * inside the library. */
    if (tls12_prf->block_number == 0xff) {
        return PSA_ERROR_CORRUPTION_DETECTED;
    }

Apparently, the capacity check did not prevent it. And I don't think the capacity check is wrong because in principle the TLS1.2 PRF can produce arbitrarily large amounts of output. I've made the key size in the large key test somewhat smaller so now all tests should succeed.

I'm thinking that I should just set 255 * 32 = 8160 bytes (255 blocks of SHA256 output) as a global limit on the exported key size (regardless of TLS version, hash function, etc.). That ought to be enough for anybody.

mfil avatar Oct 31 '24 14:10 mfil

@mfil Thanks for raising that point about TLS12_PRF. Looking into it, that was a bug in our implementation. Our first KDF was HKDF and a few things about it leaked into other KDF. I've filed it as https://github.com/Mbed-TLS/mbedtls/issues/9744. Given the limited impact, I've added it to the backlog.

gilles-peskine-arm avatar Oct 31 '24 15:10 gilles-peskine-arm

@gilles-peskine-arm said elsewhere that the number of commits in this PR is maybe too large to review. If this is purely about the number of commits and not the number of lines: A large number of the commits are just fiddling with the tests. I'd be happy to clean up the commit log, but I thought that you prefer not force-pushing.

mfil avatar Nov 01 '24 15:11 mfil

I fixed the last four failing tests. The issue was that when running depends.py curves and depends.py hashes my new tests were segfaulting in mbedtls_test_ssl_endpoint_free().

I looked at other tests that use endpoints and added the following to depends-on: PSA_WANT_ECC_SECP_R1_384 and PSA_WANT_ALG_SHA_256. This causes the tests to be skipped when they would segfault.

But this solution feels a bit cargo-culty. Is it expected that segfaults can happen when I try to use endpoints without adding these dependencies?

mfil avatar Nov 06 '24 17:11 mfil

Is it expected that segfaults can happen when I try to use endpoints without adding these dependencies?

That depends on the nature of the segfault. If the segfault is in the library and it's called with valid parameters (e.g. not a null pointer where a valid context pointer is expected), that's bad. If the segfault is in the test code (e.g. it calls some library function without checking for failures), that may be acceptable, but we'd normally expect a test failure, not a segfault.

I'm not very familiar with the SSL test functions. What I'd normally do to investigate a segfault of unknown origin is either see what ASan+UBSan has to say (cmake -D CMAKE_BUILD_TYPE=ASan), or look in a debugger.

gilles-peskine-arm avatar Nov 06 '24 17:11 gilles-peskine-arm