otp icon indicating copy to clipboard operation
otp copied to clipboard

[public_key] Use string:casefold when comparing dirs

Open timofey-barmin opened this issue 2 years ago • 10 comments

to_lower doesn't work correctly in the case when utf8String is in the list of dirs:

1> string:to_lower("Юникод"). "Юникод" 2> string:casefold("Юникод"). "юникод"

Unfortunately I could not find any unittests for this function, but since to_lower is obsolete anyway I hope this will be fine

UPDATE:

Which public_key function / functionality is it that misbehaves?

If we talk about public_key, then it affects public_key:pkix_is_issuer/2. But it also affects certs validation in ssl.

Say we have a cert chain: "Leaf Cert" <- "Root Cert" where "LeafCert" contains AuthKeyIdentifier extension with {utf8String, <<"Юникод"/utf8>>} while "RootCert" uses {utf8String, <<"юникод"/utf8>>} in the subject (the only difference is the case of the first char).

If we try to call pkix_is_issuer in this situation it will return false, because to_lower will not actually lower the case for "Юникод" when we do the comparison in is_dir_name2. More specifically, is_dir_name2({utf8String, _}, {utf8String, _}) will be transformed to is_dir_name2({printableString, _}, {printableString, _}) by the 3rd and 4th clause first, and then the 2nd clause of is_dir_name2 will fail to lower the case, so the comparison will return false, while the expected result is true in this case.

timofey-barmin avatar Jul 13 '22 01:07 timofey-barmin

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 13 '22 01:07 CLAassistant

CT Test Results

    2 files    14 suites   3m 57s :stopwatch: 205 tests 202 :heavy_check_mark: 3 :zzz: 0 :x: 220 runs  217 :heavy_check_mark: 3 :zzz: 0 :x:

Results for commit c62a60db.

:recycle: This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

github-actions[bot] avatar Jul 13 '22 01:07 github-actions[bot]

Which public_key function / functionality is it that misbehaves?

The clause you want to fix handles printableString, so the question is if that is allowed to be Unicode or if the code's assumption that it is 8-bit ASCII in fact is correct. Another question is if case should be folded here and if so why is it not folded for utf8String.

RaimoNiskanen avatar Jul 13 '22 10:07 RaimoNiskanen

Which public_key function / functionality is it that misbehaves?

If we talk about public_key, then it affects public_key:pkix_is_issuer/2. But it also affects certs validation in ssl.

Say we have a cert chain: "Leaf Cert" <- "Root Cert" where "LeafCert" contains AuthKeyIdentifier extension with {utf8String, <<"Юникод"/utf8>>} while "RootCert" uses {utf8String, <<"юникод"/utf8>>} in the subject (the only difference is the case of the first char).

If we try to call pkix_is_issuer in this situation it will return false, because to_lower will not actually lower the case for "Юникод" when we do the comparison in is_dir_name2. More specifically, is_dir_name2({utf8String, _}, {utf8String, _}) will be transformed to is_dir_name2({printableString, _}, {printableString, _}) by the 3rd and 4th clause first, and then the 2nd clause of is_dir_name2 will fail to lower the case, so the comparison will return false, while the expected result is true in this case.

I will update the PR description with the above information.

The clause you want to fix handles printableString, so the question is if that is allowed to be Unicode or if the code's assumption that it is 8-bit ASCII in fact is correct. Another question is if case should be folded here and if so why is it not folded for utf8String.

The clause that handles utf8String converts the utf8 binary to list and calls the clause with printableString, which does the comparison. That's why I'm fixing the clause with printableString. And according to that code we can't assume that it is 8-bit ASCII. At least in this very function.

timofey-barmin avatar Jul 13 '22 16:07 timofey-barmin

The type definitions in public_key uses {printableString, string()} meaning a list of Unicode character points, but the question is if that is correct according to the RFC:s...

I find in total 9 calls to string:to_lower/1 in that file. The question is if any of them are correct, so this needs a thorough overhaul.

The guys that know public_key should have a look at this when they are back from vacation.

RaimoNiskanen avatar Jul 14 '22 09:07 RaimoNiskanen

I think the logic patch would look like this:

diff --git a/lib/public_key/src/pubkey_cert.erl b/lib/public_key/src/pubkey_cert.erl
index 630981c841..f266ddc1ea 100644
--- a/lib/public_key/src/pubkey_cert.erl
+++ b/lib/public_key/src/pubkey_cert.erl
@@ -672,9 +672,12 @@ is_dir_name2(Value, Value) -> true;
 is_dir_name2({printableString, Value1}, {printableString, Value2}) ->
     string:to_lower(strip_spaces(Value1)) =:= 
        string:to_lower(strip_spaces(Value2));
-is_dir_name2({utf8String, Value1}, String) ->
+is_dir_name2({utf8String, Value1}, {utf8String, Value2}) ->
+    string:casefold(strip_spaces(Value1)) =:= 
+       string:casefold(strip_spaces(Value2));
+is_dir_name2({utf8String, Value1}, {printableString, _} = String) ->
     is_dir_name2({printableString, unicode:characters_to_list(Value1)}, String);
-is_dir_name2(String, {utf8String, Value1}) ->
+is_dir_name2({printableString, _} = String, {utf8String, Value1}) ->
     is_dir_name2(String, {printableString, unicode:characters_to_list(Value1)});
 is_dir_name2(_, _) ->
     false.

Depending on it could be real utf8strings that should be compared or an incorrectly decoded field that should be compared with an ASN-1 printableString.

IngelaAndin avatar Aug 01 '22 12:08 IngelaAndin

The definition of what characters that are allowed in an ASN.1 Printablestring are

ASN.1 PrintableString tag: 19 The ASN.1 PrintableString type supports upper case letters "A" through "Z", lower case letters "a" through "z", the digits "0" through "9", space, and common punctuation marks. Note that PrintableString does not support the "@", "&", and "*" characters.

So tolower should be ok for that type but not UTF8string

KennethL avatar Aug 03 '22 18:08 KennethL

Thinking some more about this I think actually this would be good. As string:casefold will work for all cases but might be unnecessary slow if we know that both values are of ASN-1 type printableString.

diff --git a/lib/public_key/src/pubkey_cert.erl b/lib/public_key/src/pubkey_cert.erl
index 630981c841..fdfc732bdd 100644
--- a/lib/public_key/src/pubkey_cert.erl
+++ b/lib/public_key/src/pubkey_cert.erl
@@ -672,10 +672,12 @@ is_dir_name2(Value, Value) -> true;
 is_dir_name2({printableString, Value1}, {printableString, Value2}) ->
     string:to_lower(strip_spaces(Value1)) =:= 
        string:to_lower(strip_spaces(Value2));
-is_dir_name2({utf8String, Value1}, String) ->
-    is_dir_name2({printableString, unicode:characters_to_list(Value1)}, String);
-is_dir_name2(String, {utf8String, Value1}) ->
-    is_dir_name2(String, {printableString, unicode:characters_to_list(Value1)});
+is_dir_name2({utf8String, Value1}, {_, Value2}) ->
+    string:casefold(strip_spaces(Value1)) =:= 
+        string:casefold(strip_spaces(Value2));
+is_dir_name2({_, Value1}, {utf8String, Value2}) ->
+    string:casefold(strip_spaces(Value1)) =:= 
+        string:casefold(strip_spaces(Value2));
 is_dir_name2(_, _) ->
     false.
 


@timofey-barmin would you mind making this the solution?

IngelaAndin avatar Aug 05 '22 10:08 IngelaAndin

@IngelaAndin Thanks for looking into it.

I'm not sure your approach will work in this form. Utf8string is a binary, while strip_spaces expects a list. So we would still need a conversion to list for this. This is certainly can be done, but now I have another concern actually. After taking a deeper look I start to think that maybe the problem is a bit bigger that I thought. I noticed that there is the normalize_general_name function which seems to be exactly what we need here - normalize both params and then compare. But the problem is that this function knows nothing about utf8string. Which actually means we have another type of problems. For example something like the following: Say some CA cert uses printableName in the subject, while the leaf cert issued by that CA uses utf8string in AuthorityKeyIdentifier. In this case we won't be able to find the CA in the db. The interesting part is that I actually hit this bug in erl 20. I thought it was fixed in newer version, but now I see that it actually was not. This scenario works now but just because find_cross_sign_root_paths was added. It simply tries each and every cert in the db (which is much slower I guess than finding the right cert by name using ets lookup). Giving all of that it seems like the something like the following should make sense:

diff --git a/lib/public_key/src/pubkey_cert.erl b/lib/public_key/src/pubkey_cert.erl
index 0d73d57ecd..ac3cfab098 100644
--- a/lib/public_key/src/pubkey_cert.erl
+++ b/lib/public_key/src/pubkey_cert.erl
@@ -560,6 +560,10 @@ do_normalize_general_name(Issuer) ->
     Normalize = fun([{Description, Type, {printableString, Value}}]) ->
                        NewValue = string:to_lower(strip_spaces(Value)),
                        [{Description, Type, {printableString, NewValue}}];
+                   ([{Description, Type, {utf8String, Value}}]) ->
+                        CharList = unicode:characters_to_list(Value),
+                        NewValue = strip_spaces(string:lowercase(CharList)),
+                        [{Description, Type, {printableString, NewValue}}];
                   (Atter)  ->
                        Atter
                end,

but it changes the public_key interface so I'm not sure it is the right way to do that (maybe we should convert printableString to utf8string instead). That change would make sure that we normalize the utf8string before adding the cert to db, which means we will be able to find the CA in the described above scenario (I tested it locally and it actually works).

Also if we implement something like that we can then normalize both params of is_dir_name and compare paths components using =:= (which means removal of is_dir_name2) (I didn't test this part yet).

Please let me know if you think it doesn't make any sense.

timofey-barmin avatar Aug 09 '22 23:08 timofey-barmin

Well, the purpose of normalize_name is to make a "lookup-friendly" term. find_cross_sign_root_paths is meant to handle a special case of phasing out old root certs about to expire and finding an alternative one. Also, the finding of an issuer by looking through the database is supposed to be a fallback to handle old style certs that do not have any AuthorityKeyIdentifier extension. A not backward compatible change would normally require it to be done for the next major release. However it was quite a while since I read the PKIX RFC, and there have been updates since. So I found the following in RFC

This specification requires only a subset of the name comparison
  functionality specified in the X.500 series of specifications.
  Conforming implementations are REQUIRED to implement the following
  name comparison rules:

     (a)  attribute values encoded in different types (e.g.,
     PrintableString and BMPString) MAY be assumed to represent
     different strings;

     (b) attribute values in types other than PrintableString are case
     sensitive (this permits matching of attribute values as binary
     objects);

     (c)  attribute values in PrintableString are not case sensitive
     (e.g., "Marianne Swanson" is the same as "MARIANNE SWANSON"); and

     (d)  attribute values in PrintableString are compared after
     removing leading and trailing white space and converting internal
     substrings of one or more consecutive white space characters to a
     single space.

Some certificates have been known to use the wrong encoding of some fields.

But valid utf8-strings should be compared equally in the first clause and then the code I think makes sense in the case that a printable string was wrongly encoded as utf8 but has a valid ASCCI value.

is_dir_name2(Value, Value) -> true;
is_dir_name2({printableString, Value1}, {printableString, Value2}) ->
    string:to_lower(strip_spaces(Value1)) =:= 
	string:to_lower(strip_spaces(Value2));
is_dir_name2({utf8String, Value1}, String) ->
    is_dir_name2({printableString, unicode:characters_to_list(Value1)}, String);
is_dir_name2(String, {utf8String, Value1}) ->
    is_dir_name2(String, {printableString, unicode:characters_to_list(Value1)});
is_dir_name2(_, _) ->
    false.

Maybe we need to make normalize handle the know decoding errors (see pubkey_cert_records:transform/2) !?

IngelaAndin avatar Aug 11 '22 10:08 IngelaAndin

@timofey-barmin I believe #7554 fixes the problem in the correct way. The problem is actual different then I first thought. And it seems the latest RFC has changed the take on how to handle the name comparison which we discovered when handling another issue, that is #7546 Sorry for the delay

IngelaAndin avatar Aug 15 '23 08:08 IngelaAndin