Refactor handling of idna domain names
This series of commits reworks how translation to IDNA domain names is done. It differs in functionality in only two points: a) the idna module is no longer optional b) on error, no attempt is made to continue operation.
While a) could be patched away, I don't think it is worth maintaining separate code paths that are fragile. As for b), I consider the previous behaviour a bug, where it would attempt to use ACME with the non-translated domain name after printing that an error occurred.
It seems that either not all code paths have been updaed or there is still a bug. When running the code from this version with an example idna domain I receive the following error:
Generating CSR for ['bläh.example.com'']
Error: Certificate issue/renew failed
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/acertmgr/__init__.py", line 176, in main
cert_get(config)
File "/usr/lib/python3/dist-packages/acertmgr/__init__.py", line 58, in cert_get
cr = tools.new_cert_request(settings['domainlist_idna'], key, must_staple)
File "/usr/lib/python3/dist-packages/acertmgr/tools.py", line 108, in new_cert_request
names[0].decode('utf-8') if getattr(names[0], 'decode', None) else
TypeError: 'map' object is not subscriptable
Maybe it would make sense to print both the human readable domain and the idna domain. That should make it easier to see, what is actually requested from the API as well as what a user would normally enter in the config and/or browser.
Sounds reasonable, but probably is beyond the scope of this PR. Until such a change is implemented, would you prefer the idna or the human-readable version in the log messages?
idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.
This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.
But I agree that the current code might need some additional comments to really understand what is going on.
idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.
but it does not sanitize anything?
This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.
I strongly disagree: the previous code stores the domain names three times (once in the list, twice in the translation returned from idna_convert). I consider this a step towards less complexity.
I think'd I'd prefer the solution @davidklaftenegger proposed to me in private: only translate the domain name where needed. This would also remove the problem of stroing data twice.
With regards to printing/logging: Since this PR is not fixing any bugs I'd rather wait for you (or someone else) to implement a logic similar to this: if the domain needs translation: human_readable_domain_name (translated_domain_name) else: domain_name.
I think'd I'd prefer the solution @davidklaftenegger proposed to me in private: only translate the domain name where needed. This would also remove the problem of stroing data twice.
This solution appears possible to me, but will require more changes in other places of the code.
idna_convert is currently used as an additional input sanitizer and does only touch domains that really need to be changed to an IDNA pattern (covered by the any(ord(c) > 128 part). Which makes it transparent in all other codepaths.
but it does not sanitize anything?
Sanitizing is probably the wrong word, but it converts unusable unicode (if present) to the proper format for the certificate request
This PR also starts storing domain names twice, which is (unless it is for performance reasons, which it is obviously not in this use case) dislikable because it introduces additional complexity and can be error-prone if not handled correctly in all (future) changes.
I strongly disagree: the previous code stores the domain names three times (once in the list, twice in the translation returned from idna_convert). I consider this a step towards less complexity.
Outside configuration it is only stored once in domainlist, the only place where it is used multiple times is within configuration. A simple proposal for config processing with a single domain list + a map storing the mapped domains only inlcuding the essential changes by @davidklaftenegger in this PR would be to replace lines 92-95 with the follwing and update the parts in configurion of challange handlers using domaintranslation accordingly:
# Convert unicode to IDNA domains
config['domainlist_idnamap'] = {}
for idx in range(0,len(config['domainlist'])):
if any(ord(c) >= 128 for c in config['domainlist'][idx]):
domain_human = config['domainlist'][idx]
domain_idna = idna_convert(domain_human)
config['domainlist'][idx] = domain_idna # Update domain with idna counterpart
config['domainlist_idnamap'][domain_idna] = domain_human # Store original domain for reference
The map could be then also used for any log messages with a simple domain if domain not in config['domainlist_idnamap'] else '{} ({}).format(domain, config['domainlist_idnamap'][domain]).
This proposal combined with the current state of the PR would be a good simplification of the currently too complex inda_convert funtion.
EDIT: Had to update this before being able to sleep ;) Sorry for the confusion on the earlier iteration of this comment.
Outside configuration it is only stored once in domainlist, the only place where it is used multiple times is within configuration. A simple proposal for config processing with a single domain list + a map storing the mapped domains only inlcuding the essential changes by @davidklaftenegger in this PR would be to replace lines 92-95 with the follwing and update the parts in configurion of challange handlers using domaintranslation accordingly:
# Convert unicode to IDNA domains config['domainlist_idnamap'] = {} for idx in range(0,len(config['domainlist'])): if any(ord(c) >= 128 for c in config['domainlist'][idx]): domain_human = config['domainlist'][idx] domain_idna = idna_convert(domain_human) config['domainlist'][idx] = domain_idna # Update domain with idna counterpart config['domainlist_idnamap'][domain_idna] = domain_human # Store original domain for reference
For clarification: config['domainlist'] keeps all domains necessary for processing (IDNA ones will replace the human readble ones in the process) in one place and config['domainlist_idnamap'] contains only the mapped domains with their human readable counterpart. This can be used to fix the challange handler setup as well as the already mentioned log message modifications (where applicable). Challenge handler config could look something like this (Replacement for lines 164-166):
# Update handler config with more specific values (use original names for translated unicode domains)
specificcfgs = [x for x in handlerconfigs if 'domain' in x and x['domain'] == config['domainlist_idnamap'].get(domain, domain)]
This way my propsal would confine all changes in this PR to tools.py and configuration.py without changes to other parts of the code. Except for logging things, which would have to be added where necessary. Just trying to keep it simple.
A little thing i've noticed in regards to this:
a) the idna module is no longer optional
cryptography-0.6 does not pull in idna, so this will only work if you bump the minimum required cryptography version up to a version that does. I've not had the time to figure out which version this would be. The minium valid version for this change also has to meet our goal of supporting versions used distributions that still have mainstream support.
EDIT: Typos and grammar.