acertmgr icon indicating copy to clipboard operation
acertmgr copied to clipboard

Refactor handling of idna domain names

Open davidklaftenegger opened this issue 4 years ago • 9 comments

This series of commits reworks how translation to IDNA domain names is done. It differs in functionality in only two points: a) the idna module is no longer optional b) on error, no attempt is made to continue operation.

While a) could be patched away, I don't think it is worth maintaining separate code paths that are fragile. As for b), I consider the previous behaviour a bug, where it would attempt to use ACME with the non-translated domain name after printing that an error occurred.

davidklaftenegger avatar May 29 '21 09:05 davidklaftenegger

It seems that either not all code paths have been updaed or there is still a bug. When running the code from this version with an example idna domain I receive the following error:

Generating CSR for ['bläh.example.com'']
Error: Certificate issue/renew failed
       Traceback (most recent call last):
         File "/usr/lib/python3/dist-packages/acertmgr/__init__.py", line 176, in main
           cert_get(config)
         File "/usr/lib/python3/dist-packages/acertmgr/__init__.py", line 58, in cert_get
           cr = tools.new_cert_request(settings['domainlist_idna'], key, must_staple)
         File "/usr/lib/python3/dist-packages/acertmgr/tools.py", line 108, in new_cert_request
           names[0].decode('utf-8') if getattr(names[0], 'decode', None) else
       TypeError: 'map' object is not subscriptable

moepman avatar May 29 '21 15:05 moepman

Maybe it would make sense to print both the human readable domain and the idna domain. That should make it easier to see, what is actually requested from the API as well as what a user would normally enter in the config and/or browser.

Sounds reasonable, but probably is beyond the scope of this PR. Until such a change is implemented, would you prefer the idna or the human-readable version in the log messages?

davidklaftenegger avatar May 29 '21 23:05 davidklaftenegger

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

But I agree that the current code might need some additional comments to really understand what is going on.

Kishi85 avatar May 30 '21 14:05 Kishi85

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

but it does not sanitize anything?

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

I strongly disagree: the previous code stores the domain names three times (once in the list, twice in the translation returned from idna_convert). I consider this a step towards less complexity.

davidklaftenegger avatar May 30 '21 15:05 davidklaftenegger

I think'd I'd prefer the solution @davidklaftenegger proposed to me in private: only translate the domain name where needed. This would also remove the problem of stroing data twice.

With regards to printing/logging: Since this PR is not fixing any bugs I'd rather wait for you (or someone else) to implement a logic similar to this: if the domain needs translation: human_readable_domain_name (translated_domain_name) else: domain_name.

moepman avatar May 30 '21 15:05 moepman

I think'd I'd prefer the solution @davidklaftenegger proposed to me in private: only translate the domain name where needed. This would also remove the problem of stroing data twice.

This solution appears possible to me, but will require more changes in other places of the code.

davidklaftenegger avatar May 30 '21 15:05 davidklaftenegger

idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.

but it does not sanitize anything?

Sanitizing is probably the wrong word, but it converts unusable unicode (if present) to the proper format for the certificate request

This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.

I strongly disagree: the previous code stores the domain names three times (once in the list, twice in the translation returned from idna_convert). I consider this a step towards less complexity.

Outside configuration it is only stored once in domainlist, the only place where it is used multiple times is within configuration. A simple proposal for config processing with a single domain list + a map storing the mapped domains only inlcuding the essential changes by @davidklaftenegger in this PR would be to replace lines 92-95 with the follwing and update the parts in configurion of challange handlers using domaintranslation accordingly:

# Convert unicode to IDNA domains
config['domainlist_idnamap'] = {}
    for idx in range(0,len(config['domainlist'])):
        if any(ord(c) >= 128 for c in config['domainlist'][idx]):
            domain_human = config['domainlist'][idx]
            domain_idna = idna_convert(domain_human)
            config['domainlist'][idx] = domain_idna  # Update domain with idna counterpart
            config['domainlist_idnamap'][domain_idna] = domain_human  # Store original domain for reference

The map could be then also used for any log messages with a simple domain if domain not in config['domainlist_idnamap'] else '{} ({}).format(domain, config['domainlist_idnamap'][domain]).

This proposal combined with the current state of the PR would be a good simplification of the currently too complex inda_convert funtion.

EDIT: Had to update this before being able to sleep ;) Sorry for the confusion on the earlier iteration of this comment.

Kishi85 avatar May 30 '21 17:05 Kishi85

Outside configuration it is only stored once in domainlist, the only place where it is used multiple times is within configuration. A simple proposal for config processing with a single domain list + a map storing the mapped domains only inlcuding the essential changes by @davidklaftenegger in this PR would be to replace lines 92-95 with the follwing and update the parts in configurion of challange handlers using domaintranslation accordingly:

# Convert unicode to IDNA domains
config['domainlist_idnamap'] = {}
    for idx in range(0,len(config['domainlist'])):
        if any(ord(c) >= 128 for c in config['domainlist'][idx]):
            domain_human = config['domainlist'][idx]
            domain_idna = idna_convert(domain_human)
            config['domainlist'][idx] = domain_idna  # Update domain with idna counterpart
            config['domainlist_idnamap'][domain_idna] = domain_human  # Store original domain for reference

For clarification: config['domainlist'] keeps all domains necessary for processing (IDNA ones will replace the human readble ones in the process) in one place and config['domainlist_idnamap'] contains only the mapped domains with their human readable counterpart. This can be used to fix the challange handler setup as well as the already mentioned log message modifications (where applicable). Challenge handler config could look something like this (Replacement for lines 164-166):

        # Update handler config with more specific values (use original names for translated unicode domains)
        specificcfgs = [x for x in handlerconfigs if 'domain' in x and x['domain'] == config['domainlist_idnamap'].get(domain, domain)]

This way my propsal would confine all changes in this PR to tools.py and configuration.py without changes to other parts of the code. Except for logging things, which would have to be added where necessary. Just trying to keep it simple.

Kishi85 avatar May 31 '21 07:05 Kishi85

A little thing i've noticed in regards to this:

a) the idna module is no longer optional

cryptography-0.6 does not pull in idna, so this will only work if you bump the minimum required cryptography version up to a version that does. I've not had the time to figure out which version this would be. The minium valid version for this change also has to meet our goal of supporting versions used distributions that still have mainstream support.

EDIT: Typos and grammar.

Kishi85 avatar Jun 21 '21 20:06 Kishi85