mbedtls icon indicating copy to clipboard operation
mbedtls copied to clipboard

Asymmetry when handling special chars with mbedtls_x509write_csr_set_subject_name and mbedtls_x509_dn_gets

Open mman opened this issue 2 years ago • 1 comments

Summary

I am generating certificates with (for historical reasons) special characters in subject name, such as + and #.

The certificate subject field is set as a C zero terminated string, without any special quoting or escaping, via mbedtls_x509write_csr_set_subject_name and parsed back via mbedtls_x509_dn_gets(buf, size, &crt.subject).

For example (simplified):

// mbedtls-2.16 code
// writing subject string 'CN=+#'
mbedtls_x509write_csr_set_subject_name(&crtw, "CN=+#")

// reading into buf
mbedtls_x509_dn_gets(buf, size, &crt.subject)
// buf now contains 'CN=+#'

Under mbedtls-2.16 releases, reading back subject with special characters did read is as is, and both the input string and output string were identical.

Since mbedtls-2.28, and continuing into the development branch the output of mbedtls_x509_dn_gets will produce escaped output where every special characters is prepended with double back slash \\

// mbedtls-2.28 code
// writing subject string 'CN=+#'
mbedtls_x509write_csr_set_subject_name(&crtw, "CN=+#")

// reading into buf
mbedtls_x509_dn_gets(buf, size, &crt.subject)
// buf now contains 'CN=\\+\\#'

This is probably related to the PR https://github.com/Mbed-TLS/mbedtls/pull/5861 merged into development and mbedtls-2.28 around May 2022.

This in fact produces the parsed subject string that is inconsistent with the one passed into mbedtls_x509write_csr_set_subject_name.

Trying to escape the string passed into mbedtls_x509write_csr_set_subject_name will fail and produce mbedtls error MBEDTLS_ERR_X509_INVALID_NAME.

So if I understand everything correctly, we now have a asymmetry where specifying a subject field requires not escaping special chars, but parsing out the subject field will return all special chars escaped.

I am currently working around the issue by removing the two consecutive back slashes \\ from the parsed subject field to remove the escaping but I am afraid I may not be understanding everything perfectly well.

Expected behavior

As I developer I would expect symmetry between the set/get calls, thus both mbedtls_x509write_csr_set_subject_name and mbedtls_x509_dn_gets requiring the input and producing the same output in identical format.

Actual behavior

mbedtls_x509write_csr_set_subject_name requires no escaping of special chars.

mbedtls_x509_dn_gets produces special chars escaped with double back slash \\.

mman avatar Sep 10 '22 14:09 mman

Thanks for flagging this up!

I've done some preliminary investigation and I can confirm that there seems to be improper handling of escaped special characters in mbedtls_x509_string_to_names(). I can reproduce the errors with our example programs.

However, I haven't been able to reproduce the double-backslash \\+ escaping. Are you sure this is in Mbed TLS rather than the way you're outputting the string? Given our current escaping mechanism I would expect either single-escaped \+ or double-escaped \\\+.

The \\+ looks like an artifact of our escaping to \+ followed by a more normal escaping that doesn't treat + as a special character (so only the \ is escaped).

davidhorstmann-arm avatar Sep 13 '22 17:09 davidhorstmann-arm

Sorry @davidhorstmann-arm it took me so long to get back into it.

I am actually interfacing mbedtls from swift so I had to double check what I am looking at and who is escaping what, but here are the results:

when making a cert, I pass in a UTF-8 zero terminated C-string that looks like this (13 bytes including trailing zero), no escaping:

CN= '"#+ "'+
lldb) expr -f Y -- specialSubject.utf8CString[0]
(CChar) $R0 = 43    C
(lldb) expr -f Y -- specialSubject.utf8CString[1]
(CChar) $R1 = 4e    N
(lldb) expr -f Y -- specialSubject.utf8CString[2]
(CChar) $R2 = 3d    =
(lldb) expr -f Y -- specialSubject.utf8CString[3]
(CChar) $R3 = 20     
(lldb) expr -f Y -- specialSubject.utf8CString[4]
(CChar) $R4 = 27    '
(lldb) expr -f Y -- specialSubject.utf8CString[5]
(CChar) $R5 = 22    "
(lldb) expr -f Y -- specialSubject.utf8CString[6]
(CChar) $R6 = 23    #
(lldb) expr -f Y -- specialSubject.utf8CString[7]
(CChar) $R7 = 2b    +
(lldb) expr -f Y -- specialSubject.utf8CString[8]
(CChar) $R8 = 20     
(lldb) expr -f Y -- specialSubject.utf8CString[9]
(CChar) $R9 = 22    "
(lldb) expr -f Y -- specialSubject.utf8CString[10]
(CChar) $R10 = 27    '
(lldb) expr -f Y -- specialSubject.utf8CString[11]
(CChar) $R11 = 2b    +
(lldb) expr -f Y -- specialSubject.utf8CString[12]
(CChar) $R12 = 00    .

and when I parse back the cert its subject is set to this UTF-8 zero terminated C-string (18 bytes including trailing 0):

(lldb) expr -f Y -- buf[0]
(Int8) $R39 = 43    C
(lldb) expr -f Y -- buf[1]
(Int8) $R40 = 4e    N
(lldb) expr -f Y -- buf[2]
(Int8) $R41 = 3d    =
(lldb) expr -f Y -- buf[3]
(Int8) $R42 = 20     
(lldb) expr -f Y -- buf[4]
(Int8) $R43 = 27    '
(lldb) expr -f Y -- buf[5]
(Int8) $R44 = 5c    \
(lldb) expr -f Y -- buf[6]
(Int8) $R45 = 22    "
(lldb) expr -f Y -- buf[7]
(Int8) $R46 = 5c    \
(lldb) expr -f Y -- buf[8]
(Int8) $R47 = 23    #
(lldb) expr -f Y -- buf[9]
(Int8) $R48 = 5c    \
(lldb) expr -f Y -- buf[10]
(Int8) $R49 = 2b    +
(lldb) expr -f Y -- buf[11]
(Int8) $R50 = 20     
(lldb) expr -f Y -- buf[12]
(Int8) $R51 = 5c    \
(lldb) expr -f Y -- buf[13]
(Int8) $R52 = 22    "
(lldb) expr -f Y -- buf[14]
(Int8) $R53 = 27    '
(lldb) expr -f Y -- buf[15]
(Int8) $R54 = 5c    \
(lldb) expr -f Y -- buf[16]
(Int8) $R55 = 2b    +
(lldb) expr -f Y -- buf[17]
(Int8) $R56 = 00    .

mman avatar Sep 26 '22 14:09 mman

And when I try to feed in the subject escaped as returned back, it will not be accepted by mbedtls_x509write_csr_set_subject_name returning error -9088.

mman avatar Sep 26 '22 14:09 mman

@mman thanks for the extra info, it looks like special characters are only singly-escaped, which is a relief.

As you've correctly identified, there's an asymmetry between the characters we escape in mbedtls_x509_dn_gets() and the escaped characters we accept in mbedtls_x509write_csr_set_subject_name(). The fix for this will move us to escaping all special characters properly, since we're moving towards compliance with RFC 4514.

davidhorstmann-arm avatar Sep 27 '22 08:09 davidhorstmann-arm

This looks like the same issue as #1865 - mbedtls_x509_string_to_names does not handle special characters properly. I'll close that as a duplicate.

daverodgman avatar Oct 21 '22 11:10 daverodgman

This should now have been fixed by the closing of #7924 via #8025.

mbedtls_x509write_csr_set_subject_name() should now accept the special characters escaped in the same way as they are emitted by mbedtls_x509_dn_gets().

davidhorstmann-arm avatar Sep 15 '23 14:09 davidhorstmann-arm

Closing as fixed

davidhorstmann-arm avatar Oct 03 '23 16:10 davidhorstmann-arm

Sorry @davidhorstmann-arm for not reacting earlier, I will re-test my code as soon as I get a chance, and reopen the issue if I find any remaining problems. Thanks for your time addressing this, much appreciated, Martin!

mman avatar Oct 03 '23 18:10 mman

@mman no problem at all!

davidhorstmann-arm avatar Oct 04 '23 09:10 davidhorstmann-arm

@davidhorstmann-arm David, just a quick question, I see the PR was merged into the development branch. Will there be any stable version where this will be back ported? I can test against 2.28.x and 3.x, but development seems to be very far away for me :)

mman avatar Oct 09 '23 16:10 mman

https://github.com/Mbed-TLS/mbedtls/pull/8025 was part of the 3.5.0 release. It's a new feature so we won't backport it to 2.28 which is a bug-fix-only long-term-support branch.

gilles-peskine-arm avatar Oct 09 '23 16:10 gilles-peskine-arm

@davidhorstmann-arm Just a quick one: I managed to compile my code against the development branch. Not sure where exactly we are with the effort described here https://github.com/Mbed-TLS/mbedtls/issues/6785, but I think my code still fails to properly encode/decode special characters, and UTF-8.

The first argument below is what mbedtls_x509_dn_gets() returns for my certificates, the second one is what was passed into the mbedtls_x509write_csr_set_subject_name().

Special characters:

XCTAssertEqual failed: ("Optional("CN=\\ \'\\\"#\\+ \\\"\'\\+")") is not equal to ("Optional("CN= \'\"#+ \"\'+")")

UTF-8 characters:

XCTAssertEqual failed: ("Optional("CN=\\F0\\9F\\98\\80")") is not equal to ("Optional("CN=😀")")

Maybe I am missing something, but I still kind of believe that C string that I pass in should be the C string I get back.

The smiley face emoji is represented like this C string when passed into the mbedtls:

(lldb) expr -f Y -- ptr[0]
(CChar) $R5 = 43    C
(lldb) expr -f Y -- ptr[1]
(CChar) $R6 = 4e    N
(lldb) expr -f Y -- ptr[2]
(CChar) $R7 = 3d    =
(lldb) expr -f Y -- ptr[3]
(CChar) $R8 = f0    .
(lldb) expr -f Y -- ptr[4]
(CChar) $R9 = 9f    .
(lldb) expr -f Y -- ptr[5]
(CChar) $R10 = 98    .
(lldb) expr -f Y -- ptr[6]
(CChar) $R11 = 80    .
(lldb) expr -f Y -- ptr[7]
(CChar) $R12 = 00    .

And it's returned back not as an utf-8 encoded C string, but as an escaped string with hexadecimal uppercase literals.

CN=\\F0\\9F\\98\\80

I must be missing something... (this one? https://github.com/Mbed-TLS/mbedtls/issues/7927)

mman avatar Oct 09 '23 17:10 mman

that C string that I pass in should be the C string I get back

I'm not familiar with this feature, so I don't know about this specific case. But I don't think you can expect this in general. Certificate creation is supposed to canonicalize strings such as DN. If two inputs to mbedtls_x509write_csr_set_subject_name are considered compatible, then I would expect mbedtls_x509_dn_gets to return one of them, so that they can be tested for equality.

gilles-peskine-arm avatar Oct 09 '23 17:10 gilles-peskine-arm

@mman This is correct. Dealing with UTF-8 properly is addressed by #7927, which is not completed or merged yet.

Currently we do ASCII and then escape things we don't understand (e.g. UTF-8 multibyte). This is a step better than just replacing it with ? as we did previously.

I've declared this particular issue as solved, because it's possible to pass special characters ("+# etc) escaped as \"\+\# and then receive them back escaped on the other end, thus achieving symmetry. The same is possible with UTF-8 multibyte if you're prepared to hex-escape your bytes at the start.

I'm counting proper UTF-8 multibyte support as a separate issue raised in #3865 (and previously by you in #3413). This is not yet fixed and I'm unsure when we'll get capacity to fix it, but we do have a PR in progress in #8113.

davidhorstmann-arm avatar Oct 10 '23 11:10 davidhorstmann-arm