Reload Configuration on Signal
Is your feature request related to a problem? Please describe. For implementing #184 , I need a way to instruct trow to geacefully reload it's certificates and accept new connections on a new ssl contexts.
Describe the solution you'd like Since rollover is done well ahead of certificate TTL (at roughly 80%), no urgency is at hand which is while current connection can be normally phased out.
It is not necesary for trow to detect those changes, although this is a possibility, should it be easier to implement. Probably not, since safeguards would have to be put in place to avoid race conditions while reloading the different files.
Rather the most portable solution appears to be exposing a configuration api stub, which induces a reload of the TLS context from disk upon trigger (eg. POST). Maybe something simple and similar to the healthcheck endpoints can be conceived or alternativle a special configuration port that would not be exposed outside of the pod context for some basic shielding.
Describe alternatives you've considered Kill & restart. Does imply service interruptuons.
Additional context
- #184
- SPIFFE workload attestation
- If I knwe rust, I would already have gotten my hands dirty. I'll be trying to focus on #184 instead based on the assumption that this reloading feature will land soonish.
Yeah, reloading makes sense. I think the Unix way is to respond to a sighup, right?
Yeah, reloading makes sense. I think the Unix way is to respond to a sighup, right?
Indeed, if that would be implemented it would play very very well along with: https://github.com/spiffe/spiffe-helper#readme as it's process wrapper / reload manager. If somebody could lend a hand with this in the coming days it would be a perfect coordination of efforts. :wink:
How often does certificate rotation occur? The way that occurs to me to implement it would probably incur some downtime:
- Set the k8s readiness endpoint to fail to stop new traffic
- Wait for current connections to finish (could be a while if large upload/download, which is what worries me)
- Reload config, effectively restart server
- Set readiness to ok
We're actually in the middle of a refactoring that will replace the underlying framework, so it might be an idea to complete that before moving to this task.
I'm going to give this a more generic title, as I think we should be able to handle reloading all config.
How often does certificate rotation occur?
Spire default is every 5 minutes, but it's configurable and users are expected to tweak as to strike a good balance between their security requirements and service performance.
I think reloading the whole server with downtime is relatively straight forward. I've done exactly this for other solutions, that do not support certificate reloading. Though, I don't think it fits for a canonical implementation.
There has even been a discussion on OpenSSL mailing list about the topic.
https://www.mail-archive.com/[email protected]/msg88596.html
The conclusion was more or less:
- keeping running contexts around
- only use new certs for new connections
Thanks. The trouble is that's pretty low-level stuff, and I'm not sure how much I can control it with the current frameworks.
It does also bring up an alternative solution - monitoring the cert file and automatically reloading if it changes. If it is easily possible to "keep running contexts around" that may be a better solution, but I'm still leaning towards using signals (which implies a restart and complete config reload).
I came to the conclusion, that if time and budget is to be spent on this issue, it ultimately should be made available upstream: https://github.com/SergioBenitez/Rocket/issues/1448
It looks as if this is the rocket frameworks implementation of tls. I kind of get it, first time reading rust code, though.
- https://github.com/SergioBenitez/Rocket/blob/master/core/http/src/tls.rs
I've setup a testbed here: https://github.com/ContainerSolutions/trow/pull/193 → TTL can be set to something like 30 seconds here: https://github.com/ContainerSolutions/trow/pull/193/files#diff-ac309bd9e52a2419f8aaff3203228458fbaec4f7336192cf4f4ec269ec7befd3R7