mbedtls
mbedtls copied to clipboard
Asymmetry when handling special chars with mbedtls_x509write_csr_set_subject_name and mbedtls_x509_dn_gets
Summary
I am generating certificates with (for historical reasons) special characters in subject name, such as +
and #
.
The certificate subject field is set as a C zero terminated string, without any special quoting or escaping, via mbedtls_x509write_csr_set_subject_name
and parsed back via mbedtls_x509_dn_gets(buf, size, &crt.subject)
.
For example (simplified):
// mbedtls-2.16 code
// writing subject string 'CN=+#'
mbedtls_x509write_csr_set_subject_name(&crtw, "CN=+#")
// reading into buf
mbedtls_x509_dn_gets(buf, size, &crt.subject)
// buf now contains 'CN=+#'
Under mbedtls-2.16
releases, reading back subject with special characters did read is as is, and both the input string and output string were identical.
Since mbedtls-2.28
, and continuing into the development
branch the output of mbedtls_x509_dn_gets
will produce escaped output where every special characters is prepended with double back slash \\
// mbedtls-2.28 code
// writing subject string 'CN=+#'
mbedtls_x509write_csr_set_subject_name(&crtw, "CN=+#")
// reading into buf
mbedtls_x509_dn_gets(buf, size, &crt.subject)
// buf now contains 'CN=\\+\\#'
This is probably related to the PR https://github.com/Mbed-TLS/mbedtls/pull/5861 merged into development
and mbedtls-2.28
around May 2022.
This in fact produces the parsed subject string that is inconsistent with the one passed into mbedtls_x509write_csr_set_subject_name
.
Trying to escape the string passed into mbedtls_x509write_csr_set_subject_name
will fail and produce mbedtls error MBEDTLS_ERR_X509_INVALID_NAME.
So if I understand everything correctly, we now have a asymmetry where specifying a subject field requires not escaping special chars, but parsing out the subject field will return all special chars escaped.
I am currently working around the issue by removing the two consecutive back slashes \\
from the parsed subject field to remove the escaping but I am afraid I may not be understanding everything perfectly well.
Expected behavior
As I developer I would expect symmetry between the set/get calls, thus both mbedtls_x509write_csr_set_subject_name
and mbedtls_x509_dn_gets
requiring the input and producing the same output in identical format.
Actual behavior
mbedtls_x509write_csr_set_subject_name
requires no escaping of special chars.
mbedtls_x509_dn_gets
produces special chars escaped with double back slash \\
.
Thanks for flagging this up!
I've done some preliminary investigation and I can confirm that there seems to be improper handling of escaped special characters in mbedtls_x509_string_to_names()
. I can reproduce the errors with our example programs.
However, I haven't been able to reproduce the double-backslash \\+
escaping. Are you sure this is in Mbed TLS rather than the way you're outputting the string? Given our current escaping mechanism I would expect either single-escaped \+
or double-escaped \\\+
.
The \\+
looks like an artifact of our escaping to \+
followed by a more normal escaping that doesn't treat +
as a special character (so only the \
is escaped).
Sorry @davidhorstmann-arm it took me so long to get back into it.
I am actually interfacing mbedtls from swift so I had to double check what I am looking at and who is escaping what, but here are the results:
when making a cert, I pass in a UTF-8 zero terminated C-string that looks like this (13 bytes including trailing zero), no escaping:
CN= '"#+ "'+
lldb) expr -f Y -- specialSubject.utf8CString[0]
(CChar) $R0 = 43 C
(lldb) expr -f Y -- specialSubject.utf8CString[1]
(CChar) $R1 = 4e N
(lldb) expr -f Y -- specialSubject.utf8CString[2]
(CChar) $R2 = 3d =
(lldb) expr -f Y -- specialSubject.utf8CString[3]
(CChar) $R3 = 20
(lldb) expr -f Y -- specialSubject.utf8CString[4]
(CChar) $R4 = 27 '
(lldb) expr -f Y -- specialSubject.utf8CString[5]
(CChar) $R5 = 22 "
(lldb) expr -f Y -- specialSubject.utf8CString[6]
(CChar) $R6 = 23 #
(lldb) expr -f Y -- specialSubject.utf8CString[7]
(CChar) $R7 = 2b +
(lldb) expr -f Y -- specialSubject.utf8CString[8]
(CChar) $R8 = 20
(lldb) expr -f Y -- specialSubject.utf8CString[9]
(CChar) $R9 = 22 "
(lldb) expr -f Y -- specialSubject.utf8CString[10]
(CChar) $R10 = 27 '
(lldb) expr -f Y -- specialSubject.utf8CString[11]
(CChar) $R11 = 2b +
(lldb) expr -f Y -- specialSubject.utf8CString[12]
(CChar) $R12 = 00 .
and when I parse back the cert its subject is set to this UTF-8 zero terminated C-string (18 bytes including trailing 0):
(lldb) expr -f Y -- buf[0]
(Int8) $R39 = 43 C
(lldb) expr -f Y -- buf[1]
(Int8) $R40 = 4e N
(lldb) expr -f Y -- buf[2]
(Int8) $R41 = 3d =
(lldb) expr -f Y -- buf[3]
(Int8) $R42 = 20
(lldb) expr -f Y -- buf[4]
(Int8) $R43 = 27 '
(lldb) expr -f Y -- buf[5]
(Int8) $R44 = 5c \
(lldb) expr -f Y -- buf[6]
(Int8) $R45 = 22 "
(lldb) expr -f Y -- buf[7]
(Int8) $R46 = 5c \
(lldb) expr -f Y -- buf[8]
(Int8) $R47 = 23 #
(lldb) expr -f Y -- buf[9]
(Int8) $R48 = 5c \
(lldb) expr -f Y -- buf[10]
(Int8) $R49 = 2b +
(lldb) expr -f Y -- buf[11]
(Int8) $R50 = 20
(lldb) expr -f Y -- buf[12]
(Int8) $R51 = 5c \
(lldb) expr -f Y -- buf[13]
(Int8) $R52 = 22 "
(lldb) expr -f Y -- buf[14]
(Int8) $R53 = 27 '
(lldb) expr -f Y -- buf[15]
(Int8) $R54 = 5c \
(lldb) expr -f Y -- buf[16]
(Int8) $R55 = 2b +
(lldb) expr -f Y -- buf[17]
(Int8) $R56 = 00 .
And when I try to feed in the subject escaped as returned back, it will not be accepted by mbedtls_x509write_csr_set_subject_name
returning error -9088
.
@mman thanks for the extra info, it looks like special characters are only singly-escaped, which is a relief.
As you've correctly identified, there's an asymmetry between the characters we escape in mbedtls_x509_dn_gets()
and the escaped characters we accept in mbedtls_x509write_csr_set_subject_name()
. The fix for this will move us to escaping all special characters properly, since we're moving towards compliance with RFC 4514.
This looks like the same issue as #1865 - mbedtls_x509_string_to_names
does not handle special characters properly. I'll close that as a duplicate.
This should now have been fixed by the closing of #7924 via #8025.
mbedtls_x509write_csr_set_subject_name()
should now accept the special characters escaped in the same way as they are emitted by mbedtls_x509_dn_gets()
.
Closing as fixed
Sorry @davidhorstmann-arm for not reacting earlier, I will re-test my code as soon as I get a chance, and reopen the issue if I find any remaining problems. Thanks for your time addressing this, much appreciated, Martin!
@mman no problem at all!
@davidhorstmann-arm David, just a quick question, I see the PR was merged into the development
branch. Will there be any stable version where this will be back ported? I can test against 2.28.x
and 3.x
, but development seems to be very far away for me :)
https://github.com/Mbed-TLS/mbedtls/pull/8025 was part of the 3.5.0 release. It's a new feature so we won't backport it to 2.28 which is a bug-fix-only long-term-support branch.
@davidhorstmann-arm Just a quick one: I managed to compile my code against the development
branch. Not sure where exactly we are with the effort described here https://github.com/Mbed-TLS/mbedtls/issues/6785, but I think my code still fails to properly encode/decode special characters, and UTF-8.
The first argument below is what mbedtls_x509_dn_gets()
returns for my certificates, the second one is what was passed into the mbedtls_x509write_csr_set_subject_name()
.
Special characters:
XCTAssertEqual failed: ("Optional("CN=\\ \'\\\"#\\+ \\\"\'\\+")") is not equal to ("Optional("CN= \'\"#+ \"\'+")")
UTF-8 characters:
XCTAssertEqual failed: ("Optional("CN=\\F0\\9F\\98\\80")") is not equal to ("Optional("CN=😀")")
Maybe I am missing something, but I still kind of believe that C string that I pass in should be the C string I get back.
The smiley face emoji is represented like this C string when passed into the mbedtls:
(lldb) expr -f Y -- ptr[0]
(CChar) $R5 = 43 C
(lldb) expr -f Y -- ptr[1]
(CChar) $R6 = 4e N
(lldb) expr -f Y -- ptr[2]
(CChar) $R7 = 3d =
(lldb) expr -f Y -- ptr[3]
(CChar) $R8 = f0 .
(lldb) expr -f Y -- ptr[4]
(CChar) $R9 = 9f .
(lldb) expr -f Y -- ptr[5]
(CChar) $R10 = 98 .
(lldb) expr -f Y -- ptr[6]
(CChar) $R11 = 80 .
(lldb) expr -f Y -- ptr[7]
(CChar) $R12 = 00 .
And it's returned back not as an utf-8 encoded C string, but as an escaped string with hexadecimal uppercase literals.
CN=\\F0\\9F\\98\\80
I must be missing something... (this one? https://github.com/Mbed-TLS/mbedtls/issues/7927)
that C string that I pass in should be the C string I get back
I'm not familiar with this feature, so I don't know about this specific case. But I don't think you can expect this in general. Certificate creation is supposed to canonicalize strings such as DN. If two inputs to mbedtls_x509write_csr_set_subject_name
are considered compatible, then I would expect mbedtls_x509_dn_gets
to return one of them, so that they can be tested for equality.
@mman This is correct. Dealing with UTF-8 properly is addressed by #7927, which is not completed or merged yet.
Currently we do ASCII and then escape things we don't understand (e.g. UTF-8 multibyte). This is a step better than just replacing it with ?
as we did previously.
I've declared this particular issue as solved, because it's possible to pass special characters ("+#
etc) escaped as \"\+\#
and then receive them back escaped on the other end, thus achieving symmetry. The same is possible with UTF-8 multibyte if you're prepared to hex-escape your bytes at the start.
I'm counting proper UTF-8 multibyte support as a separate issue raised in #3865 (and previously by you in #3413). This is not yet fixed and I'm unsure when we'll get capacity to fix it, but we do have a PR in progress in #8113.