ansible-playbook Add support for certbot and Let's Encrypt certificates

Closes #61.

See related documentation update in https://github.com/plone/training/pull/470.

Nov 14 '19 21:11 stevepiercy

This is a great idea, but we need to 1) make it optional, 2) document it in the ansible playbook docs (not just training) and 3) make it as close to foolproof as possible.

To make it optional, I suggest revising the "when" on the role operation to check the value of some new default variable like install_certbot. We don't want to install new packages and setup new cron jobs without giving the sysadmin an option.

Regarding making it foolproof: We need to have the certificate renewal also restart nginx if needed. Otherwise, this isn't really complete. I have some code that does this that I'll take a look at for possible inclusion. It may not be compatible with Geerling's approach.

It would also be great if we could figure out how to activate it for a host with just an entry in webserver_virtualhosts -- rather than having separate entries in webserver_virtualhosts and certbot_certs. Again, I don't know if this is compatible with Geerling's role.

Nov 14 '19 23:11 smcmahon

I can make the requested revisions. Why close the PR?

Nov 15 '19 10:11 stevepiercy

Closing was accidental. I meant to just comment.

Nov 15 '19 21:11 smcmahon

I wonder if it wouldn't be better to separate the certbot operations into a separate playbook. This could be modeled on the firewall.yml playbook and could use the local-configure.yml file in the same way to pick up needed variables. This would loose the coupling and help make it clear to the sysadmin that there are a variety of considerations to be taken into account in employing certbot.

Nov 15 '19 21:11 smcmahon

I see no need for a separate 'lets_encrypt_certificate' variable. (Perhaps I'm missing something.)

Wouldn't the existing 'certificate_file' and 'key_file' options for a webserver_virtualhosts item do just as well? That's what I've been using already with my own certbot setup.

If that would work, that's one less option to be separately documented and maintained. And, no code changes needed in the nginx mode.

Nov 15 '19 22:11 smcmahon

If the last two comments are adopted, then I think the right way to document the letsencrypt support would be in an added doc in docs. An example in the training docs is also a great idea, of course.

Thanks!

Nov 15 '19 22:11 smcmahon

I wonder if it wouldn't be better to separate the certbot operations into a separate playbook.

That makes sense, especially considering the step to stop and start the web server so that certbot can run its own server to generate the first-time cert. There's a few more things going on, too.

I see no need for a separate 'lets_encrypt_certificate' variable. (Perhaps I'm missing something.)

This was in consideration to continue supporting copying certificates. When certbot installed the LE cert, it warned me not to move them, and I don't know if copying would be harmful. I used the variable lets_encrypt_certificate to avoid copying them.

From the README:

WARNING: DO NOT MOVE THESE FILES!
         Certbot expects these files to remain in this location in order
         to function properly!

We recommend not moving these files. For more information, see the Certbot
User Guide at https://certbot.eff.org/docs/using.html#where-are-my-certificates.

Anyway, I tried your suggestion, but it failed. These are symlinks to the actual files. example.com is a mask for the actual host.

failed: [example.com] (item={'hostname': 'example.com', 'default_server': True, 'zodb_path': '/Plone', 'address': '157.245.228.22', 'port': 443, 'protocol': 'https', 'certificate_file': '/etc/letsencrypt/live/example.com/fullchain.pem', 'key_file': '/etc/letsencrypt/live/example.com/privkey.pem'}) => {"ansible_loop_var": "item", "changed": false, "item": {"address": "157.245.228.22", "certificate_file": "/etc/letsencrypt/live/example.com/fullchain.pem", "default_server": true, "hostname": "example.com", "key_file": "/etc/letsencrypt/live/example.com/privkey.pem", "port": 443, "protocol": "https", "zodb_path": "/Plone"}, "msg": "Could not find or access '/etc/letsencrypt/live/example.com/fullchain.pem' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}

How do you manage the certbot certs?

Nov 16 '19 15:11 stevepiercy

Regarding making it foolproof: We need to have the certificate renewal also restart nginx if needed.

Certbot can handle that and with the standalone option:

certbot renew --pre-hook "service nginx stop" --post-hook "service nginx start"

... Hooks will only be run if a certificate is due for renewal, so you can run the above command frequently without unnecessarily stopping your webserver.

So this example value can go in the local-configure.yml:

certbot_auto_renew_options: '--quiet --no-self-upgrade
--pre-hook "service nginx stop" --post-hook "service nginx start"'

I think I got it, but there might be a chicken-and-egg problem with restarting nginx. I have not verified yet on a clean machine, but here are my assumptions:

I need to run playbook.yml to install and configure nginx for certbot.
However nginx will not restart with the webserver_virtualhosts required to use geerlingguy.certbot because I have not yet run certbot.
I have to run playbook.yml, then geerlingguy.certbot, then playbook.yml once more to complete everything else.

Here's the process outline, after a VM is setup and has a non-root user. Would you please review and let me know whether I should change it? I'm a hack at this Ansible stuff.

Configure local-configure.yml with webserver_virtualhosts either with certbot as documented (but in a separate file, PR coming) or without certbot.
Optionally install geerlingguy.certbot and configure.
- cd ansible-playbook
- git clone https://github.com/geerlingguy/ansible-role-certbot.git geerlingguy.certbot
- geerlingguy.certbot.yml is already configured for use, but may be edited.
Run playbook.yml, then geerlingguy.certbot, then playbook.yml.

Commits and PRs coming shortly.

Nov 17 '19 19:11 stevepiercy

Updated https://github.com/plone/training/pull/470

I added docs in docs. I can also move the additions for LE and certbot from webserver.rst into a separate file.

@smcmahon ready for review. I will test this out later tonight on a clean Digital Ocean VM.

Nov 17 '19 19:11 stevepiercy

I tried this out on a clean DO VM, but I had to manually stop nginx, run the command to create the cert, and restart nginx. I don't know why the role geerlinguy.certbot did not do this, but I suspect it might be due to how variables are parsed by Ansible. Is there some way to debug or get more information about variables that are actually used?

Nov 19 '19 11:11 stevepiercy

Why, yes, there is a debug method for Ansible.

I realized that defaults in the role were not getting overridden by those in my local-configure.yml, so I moved them into the playbook geerlingguy.certbot.yml instead, and that yielded success.

I have to do more revisions to this PR, so please hold off merging until I can finish testing.

Nov 19 '19 12:11 stevepiercy

I've hit a roadblock, and I don't know how to fix it. Varnish returns an error message:

Error 503 Backend fetch failed
Backend fetch failed

Guru Meditation:
XID: 27

Varnish cache server

I'm using Python 3 in my playbook.yml. It completes after 3 runs. Along the way:

RUN 1

TASK [plone.plone_server : Supervisor task list is updated and we have a memmon] **************************************************************************************************************************************************************************************
fatal: [plone-demo.stevepiercy.com]: FAILED! => {"changed": true, "cmd": "supervisorctl stop zeoserver_memmon; supervisorctl remove zeoserver_memmon", "delta": "0:00:00.768966", "end": "2019-11-20 00:39:08.201577", "msg": "non-zero return code", "rc": 1, "start": "2019-11-20 00:39:07.432611", "stderr": "", "stderr_lines": [], "stdout": "zeoserver_memmon: ERROR (no such process)\nERROR: no such process/group: zeoserver_memmon", "stdout_lines": ["zeoserver_memmon: ERROR (no such process)", "ERROR: no such process/group: zeoserver_memmon"]}

RUN 2

The previous issue seems to be resolved, but the next one crops up.

TASK [plone.plone_server : Supervisor task list is updated and we have a memmon] **************************************************************************************************************************************************************************************
skipping: [plone-demo.stevepiercy.com]

...

TASK [plone.plone_server : Create initial Plone site] *****************************************************************************************************************************************************************************************************************
[WARNING]: Module remote_tmp /home/plone_daemon/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually

fatal: [plone-demo.stevepiercy.com]: FAILED! => {
  "changed": true,
  "cmd": [
    "bin/client_reserved",
    "run",
    "scripts/addPloneSite.py"
  ],
  "delta": "0:00:02.518976",
  "end": "2019-11-20 01:01:17.811993",
  "msg": "non-zero return code",
  "rc": 1,
  "start": "2019-11-20 01:01:15.293017",
  "stderr": "Traceback (most recent call last):\n  File \"bin/client_reserved\", line 266, in <module>\n    + sys.argv[1:]))\n  File \"/usr/local/plone-5.2/buildout-cache/eggs/plone.recipe.zope2instance-6.3.0-py3.6.egg/plone/recipe/zope2instance/ctl.py\", line 993, in main\n    func = ep.load()\n  File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2434, in load\n    return self.resolve()\n  File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2440, in resolve\n    module = __import__(self.module_name, fromlist=['__name__'], level=0)\n  File \"/usr/local/plone-5.2/buildout-cache/eggs/five.z2monitor-0.2-py3.6.egg/five/z2monitor/__init__.py\", line 19, in <module>\n    import zc.monitor\n  File \"/usr/local/plone-5.2/buildout-cache/eggs/zc.monitor-0.3.1-py3.6.egg/zc/monitor/__init__.py\", line 59\n    except Exception, v:\n                    ^\nSyntaxError: invalid syntax",
  "stderr_lines": [
    "Traceback (most recent call last):",
    "  File \"bin/client_reserved\", line 266, in <module>",
    "    + sys.argv[1:]))",
    "  File \"/usr/local/plone-5.2/buildout-cache/eggs/plone.recipe.zope2instance-6.3.0-py3.6.egg/plone/recipe/zope2instance/ctl.py\", line 993, in main",
    "    func = ep.load()",
    "  File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2434, in load",
    "    return self.resolve()",
    "  File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2440, in resolve",
    "    module = __import__(self.module_name, fromlist=['__name__'], level=0)",
    "  File \"/usr/local/plone-5.2/buildout-cache/eggs/five.z2monitor-0.2-py3.6.egg/five/z2monitor/__init__.py\", line 19, in <module>",
    "    import zc.monitor",
    "  File \"/usr/local/plone-5.2/buildout-cache/eggs/zc.monitor-0.3.1-py3.6.egg/zc/monitor/__init__.py\", line 59",
    "    except Exception, v:",
    "                    ^",
    "SyntaxError: invalid syntax"
  ],
  "stdout": "",
  "stdout_lines": []
}

This appears to be an Python 3 incompatibility.

https://github.com/zopefoundation/zc.monitor/blob/master/src/zc/monitor/init.py#L59

Should be:

            except Exception as v:

I tried editing my server's copy of that file, and running the playbook one more time, but that had no affect on Varnish. I have been able to reliably reproduce this issue on clean DO VMs.

Can anyone point me in the right direction to troubleshoot this further?

Nov 20 '19 01:11 stevepiercy

As a sanity check, I dropped back to Python 2 for the install, and there was no Varnish error.

I submitted a PR for the zc.monitor issue. Hopefully that resolves the issue with the error reported by Varnish.

Nov 20 '19 07:11 stevepiercy

2 months after setting up a new Plone instance with this configuration, auto renewal fails.

Problem binding to port 80: Could not bind to IPv4 or IPv6.. Skipping.

I don't want to go in manually every 3 months to stop nginx, run certbot, and restart nginx. How do other folks handle letsencrypt automatic renewal?

Jan 14 '20 09:01 stevepiercy

Personally, I have this in root's crontab on all my hosts:

@monthly /usr/bin/certbot renew --post-hook "service nginx restart"

Jan 15 '20 23:01 fulv

@fulv I tried running the command you have in cron, but it returns the same error message. Do you have the standalone version of certbot?

I scoured letsencrypt's community for answers, but all I found was to manually stop the webserver so that certbot could then bind to port 80 and renew the certificate.

Jan 16 '20 05:01 stevepiercy

@stevepiercy that's also what we do. We have a playbook that

stops what's running on port 80 (nginx in our case)
runs renew
restarts nginx

a bit clunky and not 100% uptime, but kinda works. Anything else requires that you have scripted access to your DNS provider, which we don't. (If you do, you can script to update the DNS authentication method of certbot)

Jan 16 '20 10:01 polyester

@polyester I'm in the same boat. No DNS hooks for LE. I'm using nginx, per the defaults of this playbook. Can you share a sanitized version of your playbook? It sounds like you still need to manually run it once per quarter, though, but at least it would save a few manual steps. I can deal with that.

Jan 16 '20 11:01 stevepiercy

@stevepiercy it really is the simplest playbook, basically

- name: stop nginx service 
  service: name=nginx state=stopped 
- name: renew cert
  command: certbot renew
- name: start nginx service 
  service: name=nginx state=started

which isn't very refined. Our ansible master is cronnable, but if you have to run it manually of course that is prone to forgetting. Maybe for the playbook having the cronjob on the target do the "stop nginx, renew certificate, start nginx" would be sufficient? (although of course Ansible complains louder and better if for whatever reason nginx doesn't come up again...)

Jan 16 '20 13:01 polyester

I use the nginx version of certbot.

Fulvio

On Thu, Jan 16, 2020 at 5:28 AM Paul Roeland [email protected] wrote:

@stevepiercy https://github.com/stevepiercy it really is the simplest playbook, basically

name: stop nginx service service: name=nginx state=stopped

name: renew cert command: certbot renew

name: start nginx service service: name=nginx state=started

which isn't very refined. Our ansible master is cronnable, but if you have to run it manually of course that is prone to forgetting. Maybe for the playbook having the cronjob on the target do the "stop nginx, renew certificate, start nginx" would be sufficient? (although of course Ansible complains louder and better if for whatever reason nginx doesn't come up again...)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/plone/ansible-playbook/pull/125?email_source=notifications&email_token=AADQPRTZJQVIGPOMEGUX6MLQ6BOI3A5CNFSM4JNRZWD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJEBXUQ#issuecomment-575151058, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQPRXF77USJ7CV5BKNX33Q6BOI3ANCNFSM4JNRZWDQ .

Jan 16 '20 22:01 fulv

I have https://mailinabox.email running and it renews the certs without my quasi-human intervention, if you're looking for possible examples

Feb 01 '20 22:02 tkimnguyen

@tkimnguyen yes, please! I still haven't figured this one out.

Feb 02 '20 00:02 stevepiercy

@tkimnguyen reping. I'd like to see your example.

Feb 09 '20 03:02 stevepiercy

@stevepiercy this is what the certbot docs say about not stopping the webserver during the certificate issuance process: https://certbot.eff.org/docs/using.html#webroot

Feb 09 '20 17:02 tkimnguyen

@tkimnguyen how do you use the webroot option with Plone? I can't figure out the value for --webroot-path for a Plone site. /var/www/html is the default path, but content is not served from there.

Feb 11 '20 04:02 stevepiercy

The current version of the certbot-nginx plugin is supposedly capable of issuing and renewing with no downtime. There's a discussion of how this is done in the thread at:

https://certbot.eff.org/faq#can-i-issue-a-certificate-without-bringing-down-my-web-server

with some supplementary information from nginx at:

https://www.nginx.com/faq/how-does-zero-downtime-configuration-testingreload-in-nginx-plus-work/

If this is acceptable, that plugin makes things dead simple. I've tried it out in a branch:

https://github.com/plone/ansible-playbook/tree/simplified-certbot

See https://github.com/plone/ansible-playbook/blob/simplified-certbot/docs/certbot.rst for quick documentation.

Mar 17 '20 23:03 smcmahon

@smcmahon I checked out that branch, and added an entry to my local-configure.yml, then ran ansible-playbook -K certbot.yml. It ran alone just fine, but ultimately fails at the step Test renewal with a dry run. with the following error message:

certbot.errors.StandaloneBindError: Problem binding to port 80: Could not bind to IPv4 or IPv6.

I then commented out certificate from local-configure.yml so that the roles/nginx/templates/host.j2 would use the certbot_hosts value. Still got the same error message.

I found that when I ssh in, and issue the command certbot renew --dry-run --nginx, then it works just fine. Without the --nginx flag, the command fails as it defaults to the standalone server.

I pushed a commit with some suggested changes.

Thanks for doing the legwork on this!

Mar 18 '20 10:03 stevepiercy

I also added a cron job to automatically attempt to renew the certificates in https://github.com/plone/ansible-playbook/commit/f0cc209d0e2a1ba95eb7d5b0e6df644e40028665

Mar 18 '20 11:03 stevepiercy

Smoke test:

I tried adding, renewing and revoking certificates on a host using the certbot-nginx plugin while simultaneously hitting a static site with 100,000 sequential ab requests. I saw no failed requests and no latency greater than 10 ms.

Mar 18 '20 18:03 smcmahon

No cronjob is needed; the certbot-nginx package creates its own with a randomized run time.

I've created a pull request for the "simplified certbot" branch with stevepiercy's other changes.

Mar 18 '20 22:03 smcmahon