ansible-playbook
ansible-playbook copied to clipboard
Add support for certbot and Let's Encrypt certificates
Closes #61.
See related documentation update in https://github.com/plone/training/pull/470.
This is a great idea, but we need to 1) make it optional, 2) document it in the ansible playbook docs (not just training) and 3) make it as close to foolproof as possible.
To make it optional, I suggest revising the "when" on the role operation to check the value of some new default variable like install_certbot
. We don't want to install new packages and setup new cron jobs without giving the sysadmin an option.
Regarding making it foolproof: We need to have the certificate renewal also restart nginx if needed. Otherwise, this isn't really complete. I have some code that does this that I'll take a look at for possible inclusion. It may not be compatible with Geerling's approach.
It would also be great if we could figure out how to activate it for a host with just an entry in webserver_virtualhosts -- rather than having separate entries in webserver_virtualhosts and certbot_certs. Again, I don't know if this is compatible with Geerling's role.
I can make the requested revisions. Why close the PR?
Closing was accidental. I meant to just comment.
I wonder if it wouldn't be better to separate the certbot operations into a separate playbook. This could be modeled on the firewall.yml playbook and could use the local-configure.yml file in the same way to pick up needed variables. This would loose the coupling and help make it clear to the sysadmin that there are a variety of considerations to be taken into account in employing certbot.
I see no need for a separate 'lets_encrypt_certificate' variable. (Perhaps I'm missing something.)
Wouldn't the existing 'certificate_file' and 'key_file' options for a webserver_virtualhosts item do just as well? That's what I've been using already with my own certbot setup.
If that would work, that's one less option to be separately documented and maintained. And, no code changes needed in the nginx mode.
If the last two comments are adopted, then I think the right way to document the letsencrypt support would be in an added doc in docs. An example in the training docs is also a great idea, of course.
Thanks!
I wonder if it wouldn't be better to separate the certbot operations into a separate playbook.
That makes sense, especially considering the step to stop and start the web server so that certbot can run its own server to generate the first-time cert. There's a few more things going on, too.
I see no need for a separate 'lets_encrypt_certificate' variable. (Perhaps I'm missing something.)
This was in consideration to continue supporting copying certificates. When certbot installed the LE cert, it warned me not to move them, and I don't know if copying would be harmful. I used the variable lets_encrypt_certificate
to avoid copying them.
From the README:
WARNING: DO NOT MOVE THESE FILES!
Certbot expects these files to remain in this location in order
to function properly!
We recommend not moving these files. For more information, see the Certbot
User Guide at https://certbot.eff.org/docs/using.html#where-are-my-certificates.
Anyway, I tried your suggestion, but it failed. These are symlinks to the actual files. example.com
is a mask for the actual host.
failed: [example.com] (item={'hostname': 'example.com', 'default_server': True, 'zodb_path': '/Plone', 'address': '157.245.228.22', 'port': 443, 'protocol': 'https', 'certificate_file': '/etc/letsencrypt/live/example.com/fullchain.pem', 'key_file': '/etc/letsencrypt/live/example.com/privkey.pem'}) => {"ansible_loop_var": "item", "changed": false, "item": {"address": "157.245.228.22", "certificate_file": "/etc/letsencrypt/live/example.com/fullchain.pem", "default_server": true, "hostname": "example.com", "key_file": "/etc/letsencrypt/live/example.com/privkey.pem", "port": 443, "protocol": "https", "zodb_path": "/Plone"}, "msg": "Could not find or access '/etc/letsencrypt/live/example.com/fullchain.pem' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
How do you manage the certbot certs?
Regarding making it foolproof: We need to have the certificate renewal also restart nginx if needed.
Certbot can handle that and with the standalone option:
certbot renew --pre-hook "service nginx stop" --post-hook "service nginx start"
... Hooks will only be run if a certificate is due for renewal, so you can run the above command frequently without unnecessarily stopping your webserver.
So this example value can go in the local-configure.yml
:
certbot_auto_renew_options: '--quiet --no-self-upgrade
--pre-hook "service nginx stop" --post-hook "service nginx start"'
I think I got it, but there might be a chicken-and-egg problem with restarting nginx. I have not verified yet on a clean machine, but here are my assumptions:
- I need to run
playbook.yml
to install and configure nginx for certbot. - However nginx will not restart with the
webserver_virtualhosts
required to usegeerlingguy.certbot
because I have not yet run certbot. - I have to run
playbook.yml
, thengeerlingguy.certbot
, thenplaybook.yml
once more to complete everything else.
Here's the process outline, after a VM is setup and has a non-root user. Would you please review and let me know whether I should change it? I'm a hack at this Ansible stuff.
- Configure
local-configure.yml
withwebserver_virtualhosts
either with certbot as documented (but in a separate file, PR coming) or without certbot. - Optionally install
geerlingguy.certbot
and configure.-
cd ansible-playbook
-
git clone https://github.com/geerlingguy/ansible-role-certbot.git geerlingguy.certbot
-
geerlingguy.certbot.yml
is already configured for use, but may be edited.
-
- Run
playbook.yml
, thengeerlingguy.certbot
, thenplaybook.yml
.
Commits and PRs coming shortly.
Updated https://github.com/plone/training/pull/470
I added docs in docs
. I can also move the additions for LE and certbot from webserver.rst
into a separate file.
@smcmahon ready for review. I will test this out later tonight on a clean Digital Ocean VM.
I tried this out on a clean DO VM, but I had to manually stop nginx, run the command to create the cert, and restart nginx. I don't know why the role geerlinguy.certbot
did not do this, but I suspect it might be due to how variables are parsed by Ansible. Is there some way to debug or get more information about variables that are actually used?
Why, yes, there is a debug method for Ansible.
I realized that defaults in the role were not getting overridden by those in my local-configure.yml
, so I moved them into the playbook geerlingguy.certbot.yml
instead, and that yielded success.
I have to do more revisions to this PR, so please hold off merging until I can finish testing.
I've hit a roadblock, and I don't know how to fix it. Varnish returns an error message:
Error 503 Backend fetch failed
Backend fetch failed
Guru Meditation:
XID: 27
Varnish cache server
I'm using Python 3 in my playbook.yml
. It completes after 3 runs. Along the way:
RUN 1
TASK [plone.plone_server : Supervisor task list is updated and we have a memmon] **************************************************************************************************************************************************************************************
fatal: [plone-demo.stevepiercy.com]: FAILED! => {"changed": true, "cmd": "supervisorctl stop zeoserver_memmon; supervisorctl remove zeoserver_memmon", "delta": "0:00:00.768966", "end": "2019-11-20 00:39:08.201577", "msg": "non-zero return code", "rc": 1, "start": "2019-11-20 00:39:07.432611", "stderr": "", "stderr_lines": [], "stdout": "zeoserver_memmon: ERROR (no such process)\nERROR: no such process/group: zeoserver_memmon", "stdout_lines": ["zeoserver_memmon: ERROR (no such process)", "ERROR: no such process/group: zeoserver_memmon"]}
RUN 2
The previous issue seems to be resolved, but the next one crops up.
TASK [plone.plone_server : Supervisor task list is updated and we have a memmon] **************************************************************************************************************************************************************************************
skipping: [plone-demo.stevepiercy.com]
...
TASK [plone.plone_server : Create initial Plone site] *****************************************************************************************************************************************************************************************************************
[WARNING]: Module remote_tmp /home/plone_daemon/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually
fatal: [plone-demo.stevepiercy.com]: FAILED! => {
"changed": true,
"cmd": [
"bin/client_reserved",
"run",
"scripts/addPloneSite.py"
],
"delta": "0:00:02.518976",
"end": "2019-11-20 01:01:17.811993",
"msg": "non-zero return code",
"rc": 1,
"start": "2019-11-20 01:01:15.293017",
"stderr": "Traceback (most recent call last):\n File \"bin/client_reserved\", line 266, in <module>\n + sys.argv[1:]))\n File \"/usr/local/plone-5.2/buildout-cache/eggs/plone.recipe.zope2instance-6.3.0-py3.6.egg/plone/recipe/zope2instance/ctl.py\", line 993, in main\n func = ep.load()\n File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2434, in load\n return self.resolve()\n File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2440, in resolve\n module = __import__(self.module_name, fromlist=['__name__'], level=0)\n File \"/usr/local/plone-5.2/buildout-cache/eggs/five.z2monitor-0.2-py3.6.egg/five/z2monitor/__init__.py\", line 19, in <module>\n import zc.monitor\n File \"/usr/local/plone-5.2/buildout-cache/eggs/zc.monitor-0.3.1-py3.6.egg/zc/monitor/__init__.py\", line 59\n except Exception, v:\n ^\nSyntaxError: invalid syntax",
"stderr_lines": [
"Traceback (most recent call last):",
" File \"bin/client_reserved\", line 266, in <module>",
" + sys.argv[1:]))",
" File \"/usr/local/plone-5.2/buildout-cache/eggs/plone.recipe.zope2instance-6.3.0-py3.6.egg/plone/recipe/zope2instance/ctl.py\", line 993, in main",
" func = ep.load()",
" File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2434, in load",
" return self.resolve()",
" File \"/usr/local/plone-5.2/zeoserver/lib/python3.6/site-packages/pkg_resources/__init__.py\", line 2440, in resolve",
" module = __import__(self.module_name, fromlist=['__name__'], level=0)",
" File \"/usr/local/plone-5.2/buildout-cache/eggs/five.z2monitor-0.2-py3.6.egg/five/z2monitor/__init__.py\", line 19, in <module>",
" import zc.monitor",
" File \"/usr/local/plone-5.2/buildout-cache/eggs/zc.monitor-0.3.1-py3.6.egg/zc/monitor/__init__.py\", line 59",
" except Exception, v:",
" ^",
"SyntaxError: invalid syntax"
],
"stdout": "",
"stdout_lines": []
}
This appears to be an Python 3 incompatibility.
https://github.com/zopefoundation/zc.monitor/blob/master/src/zc/monitor/init.py#L59
Should be:
except Exception as v:
I tried editing my server's copy of that file, and running the playbook one more time, but that had no affect on Varnish. I have been able to reliably reproduce this issue on clean DO VMs.
Can anyone point me in the right direction to troubleshoot this further?
As a sanity check, I dropped back to Python 2 for the install, and there was no Varnish error.
I submitted a PR for the zc.monitor
issue. Hopefully that resolves the issue with the error reported by Varnish.
2 months after setting up a new Plone instance with this configuration, auto renewal fails.
Problem binding to port 80: Could not bind to IPv4 or IPv6.. Skipping.
I don't want to go in manually every 3 months to stop nginx, run certbot, and restart nginx. How do other folks handle letsencrypt automatic renewal?
Personally, I have this in root
's crontab on all my hosts:
@monthly /usr/bin/certbot renew --post-hook "service nginx restart"
@fulv I tried running the command you have in cron, but it returns the same error message. Do you have the standalone version of certbot?
I scoured letsencrypt's community for answers, but all I found was to manually stop the webserver so that certbot could then bind to port 80 and renew the certificate.
@stevepiercy that's also what we do. We have a playbook that
- stops what's running on port 80 (nginx in our case)
- runs renew
- restarts nginx
a bit clunky and not 100% uptime, but kinda works. Anything else requires that you have scripted access to your DNS provider, which we don't. (If you do, you can script to update the DNS authentication method of certbot)
@polyester I'm in the same boat. No DNS hooks for LE. I'm using nginx, per the defaults of this playbook. Can you share a sanitized version of your playbook? It sounds like you still need to manually run it once per quarter, though, but at least it would save a few manual steps. I can deal with that.
@stevepiercy it really is the simplest playbook, basically
- name: stop nginx service
service: name=nginx state=stopped
- name: renew cert
command: certbot renew
- name: start nginx service
service: name=nginx state=started
which isn't very refined. Our ansible master is cronnable, but if you have to run it manually of course that is prone to forgetting. Maybe for the playbook having the cronjob on the target do the "stop nginx, renew certificate, start nginx" would be sufficient? (although of course Ansible complains louder and better if for whatever reason nginx doesn't come up again...)
I use the nginx version of certbot.
Fulvio
On Thu, Jan 16, 2020 at 5:28 AM Paul Roeland [email protected] wrote:
@stevepiercy https://github.com/stevepiercy it really is the simplest playbook, basically
- name: stop nginx service service: name=nginx state=stopped
- name: renew cert command: certbot renew
- name: start nginx service service: name=nginx state=started
which isn't very refined. Our ansible master is cronnable, but if you have to run it manually of course that is prone to forgetting. Maybe for the playbook having the cronjob on the target do the "stop nginx, renew certificate, start nginx" would be sufficient? (although of course Ansible complains louder and better if for whatever reason nginx doesn't come up again...)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/plone/ansible-playbook/pull/125?email_source=notifications&email_token=AADQPRTZJQVIGPOMEGUX6MLQ6BOI3A5CNFSM4JNRZWD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJEBXUQ#issuecomment-575151058, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQPRXF77USJ7CV5BKNX33Q6BOI3ANCNFSM4JNRZWDQ .
I have https://mailinabox.email running and it renews the certs without my quasi-human intervention, if you're looking for possible examples
@tkimnguyen yes, please! I still haven't figured this one out.
@tkimnguyen reping. I'd like to see your example.
@stevepiercy this is what the certbot docs say about not stopping the webserver during the certificate issuance process: https://certbot.eff.org/docs/using.html#webroot
@tkimnguyen how do you use the webroot option with Plone? I can't figure out the value for --webroot-path
for a Plone site. /var/www/html
is the default path, but content is not served from there.
The current version of the certbot-nginx plugin is supposedly capable of issuing and renewing with no downtime. There's a discussion of how this is done in the thread at:
https://certbot.eff.org/faq#can-i-issue-a-certificate-without-bringing-down-my-web-server
with some supplementary information from nginx at:
https://www.nginx.com/faq/how-does-zero-downtime-configuration-testingreload-in-nginx-plus-work/
If this is acceptable, that plugin makes things dead simple. I've tried it out in a branch:
https://github.com/plone/ansible-playbook/tree/simplified-certbot
See https://github.com/plone/ansible-playbook/blob/simplified-certbot/docs/certbot.rst for quick documentation.
@smcmahon I checked out that branch, and added an entry to my local-configure.yml
, then ran ansible-playbook -K certbot.yml
. It ran alone just fine, but ultimately fails at the step Test renewal with a dry run.
with the following error message:
certbot.errors.StandaloneBindError: Problem binding to port 80: Could not bind to IPv4 or IPv6.
I then commented out certificate
from local-configure.yml
so that the roles/nginx/templates/host.j2
would use the certbot_hosts
value. Still got the same error message.
I found that when I ssh in, and issue the command certbot renew --dry-run --nginx
, then it works just fine. Without the --nginx
flag, the command fails as it defaults to the standalone server.
I pushed a commit with some suggested changes.
Thanks for doing the legwork on this!
I also added a cron job to automatically attempt to renew the certificates in https://github.com/plone/ansible-playbook/commit/f0cc209d0e2a1ba95eb7d5b0e6df644e40028665
Smoke test:
I tried adding, renewing and revoking certificates on a host using the certbot-nginx plugin while simultaneously hitting a static site with 100,000 sequential ab requests. I saw no failed requests and no latency greater than 10 ms.
No cronjob is needed; the certbot-nginx package creates its own with a randomized run time.
I've created a pull request for the "simplified certbot" branch with stevepiercy's other changes.