matrix-docker-ansible-deploy icon indicating copy to clipboard operation
matrix-docker-ansible-deploy copied to clipboard

Add support for Let's Encrypt DNS challenge

Open ZzMzaw opened this issue 3 years ago • 4 comments

In order to allow certificate creation and matrix federation with my base domain without having to serve anything from it, I propose this implementation of Let's Encrypt DNS challenges.

It supports all official certbot DNS plugins (those listed here: https://eff-certbot.readthedocs.io/en/stable/using.html#dns-plugins) having official docker images.

Way of working is quite simple. A credentials configuration file has to be generated for the DNS plugin which will be triggered during certificate request. The cerbot image is used accordingly to include the plugin. Only official certbot images are used. Renewal rely only on certbot renewal configuration files (automatically generated by cerbot). Documentation might be up-to-date and sufficient. Feel free to raise any concern about it so that I can improve it.

Everything is configurable in an opt-in way (default keep current behavior) and is compatible with having both HTTP and DNS challenges at the same time.

PR is in draft because I still need to do some tests but code might be ready and will only be adapted depending on test results and feedback. About tests, I will only be able to test OVH DNS plugin. Everything else might remain untested by me. Other official plugins but the AWS one may work almost the same way so it might not be a problem. The solution for AWS (because credentials handling is different) might work but would require to be tested.

I am not sure how to handle that situation. I would really appreciate feedback about tests for other DNS plugins (in particular the AWS one) but if it is not possible (and it might not be possible to have someone testing for all of them anyway), I can propose the 2 possible following solutions:

  1. Add a big warning in the documentation that some DNS plugins have not been tested with the list of tests status for each of them but let people use them if they want
  2. Prevent the use of the DNS plugins which have not been tested (possible during validation step of the playbook), for sure with associated documentation either

What do you think would be the best to handle it?

Last but not least, I cannot spend all the time I would like on this (as most of us I would say) so don't worry if I answer slowly (between days and weeks). Be sure, I will come back here regularly, as soon as I can. Thanks for your understanding and this great playbook.

ZzMzaw avatar Jun 11 '22 06:06 ZzMzaw

Just noticed self-check doesn't work with my current configuration. I assume it is due to the fact I changed the certificate used for federation endpoint. matrix.<base-domain> seems expected in self-check but when I run Matrix Federation Tester (https://federationtester.matrix.org/) with my base domain, it is successful.

ZzMzaw avatar Jun 11 '22 06:06 ZzMzaw

Wow, this is a huge PR! Thank you for spending so much time and effort to get Let's Encrypt DNS challenge support into the playbook!

I gave this a quick look, but haven't reviewed it in much detail yet.

spantaleev avatar Jun 11 '22 07:06 spantaleev

This PR worked in my small production server for the last 3 months and the certificates renewal performed properly. I merged last commit and spent some time fixing all ansible-lint issues (sorry for the many force pushes).

From my perspective, PR can be reviewed.

There are still two points to consider before merging:

  1. Self-check is still failing
  2. I was only able to test OVH DNS provider

Regarding point 1), I had to deactivate SSL check for "Check Matrix Federation API" (roles/matrix-synapse/tasks/self_check_federation_api.yml) because the federation endpoint SSL certificate is only valid for <base-domain> and not for matrix.<base-domain>. Nevertheless, I am not sure it is the right way to do even if running Matrix Federation Tester (https://federationtester.matrix.org/) with my base domain is successful. Even after doing that, the self-check failed at "Check .well-known on the identity hostname" (roles/matrix-nginx-proxy/tasks/self_check_well_known_file.yml) because it expects <base-domain> to expose the .well-know. I was planning to execute any check relative to identity server only when matrix_well_known_matrix_server_enabled is true. Would it be the right way to do (without risk of introducing issues somewhere else)?

Regarding point 2), I can propose the 2 possible following solutions:

  • Add a big warning in the documentation that some DNS plugins have not been tested with the list of tests status for each of them but let people use them if they want
  • Prevent the use of the DNS plugins which have not been tested (possible during validation step of the playbook), for sure with associated documentation either

What do you think would be the best way to handle it?

Thanks for your feedback.

ZzMzaw avatar Sep 05 '22 21:09 ZzMzaw

Thanks for your work on this PR. I have been testing it with the Cloudflare provider on a non-federated Matrix instance, where Let's Encrypt certificates are desirable but publicly exposing the instance to facilitate HTTP challenges would add complexity to the setup.

Although I haven't reached a renew cycle yet, certificate acquisition worked as expected. One minor issue I encountered was challenges timing out before DNS had a chance to propagate. Increasing the --dns-cloudflare-propagation-seconds value from the default (5 seconds) resolved the issue.

Perhaps this could be exposed as an optional parameter in matrix_ssl_lets_encrypt_dns_challenge_domains. E.g.

matrix_ssl_lets_encrypt_dns_challenge_domains:
  - domain: '{{ matrix_domain }}'
    provider: 'cloudflare'
    config_file: 'my.cloudflare.apitoken'
  - domain: 'matrix.{{ matrix_domain }}'
    provider: 'cloudflare'
    config_file: 'my.cloudflare.apitoken'
    propagation_seconds: 30
  - domain: 'element.{{ matrix_domain }}'
    provider: 'cloudflare'
    config_file: 'my.cloudflare.apitoken'
  - domain: 'grafana.someother.domain'
    provider: 'ovh'
    config_file: 'ovh.ini'
    propagation_seconds: 60

With a corresponding block in setup_ssl_lets_encrypt_obtain_for_domain.yml. E.g.

{% if domain_config.propagation_seconds is defined %}
    --dns-{{ domain_config.provider }}-propagation-seconds={{ domain_config.propagation_seconds }}
{% endif %}

coffeebucket avatar Sep 11 '22 08:09 coffeebucket

I think this is kind of obsolete with Traefik becoming the default and nginx-proxy being phased out?

rltas avatar May 11 '23 13:05 rltas

Indeed, it is! Closing

spantaleev avatar May 11 '23 15:05 spantaleev