consul-template icon indicating copy to clipboard operation
consul-template copied to clipboard

Simultaneous SSL update on all instances.

Open scor2k opened this issue 2 years ago • 3 comments

Hi!

We use consul-template + Vault PKI to provide SSL certificates for the MySQL Galera cluster. We did some tests with short TTL (15m) for SSL and faced the issue when the Galera cluster crashed because of simultaneous SSL re-generation for all nodes (we send ALTER INSTANCE RELOAD TLS; via reload-script each time a new certificate has been done).

Also, we faced the same issue with the Apache Kafka cluster (with SSL) but TTL was 7 days. Honestly, it was only once for 1 month, but it has happened.

We applied a fix to shift TTL for 1 day for every next node, it helps to reduce the chance, but it's not a fix.

My question is simple: Any way you have some distributive lock (via Consul) to prevent all instances from updating certificates at the same time?

mysqld config x 3 instatces

$ cat my.cnf
[client]
port = 33306
socket = /tmp/mysql.sock
default-character-set = utf8

[mysqld]
pxc_encrypt_cluster_traffic=ON
user = mysql
ssl-ca = /opt/mysql/tls/server/server-ca.pem
ssl-cert = /opt/mysql/tls/server/server-cert.pem
ssl-key = /opt/mysql/tls/server/server-key.pem
...

consul-template configs x 3 instances

$ cat conf/consul-template.hcl
vault {
  address = "https://127.0.0.1:8200"
  
  unwrap_token = false
  renew_token  = true
  
  lease_renewal_threshold = 0.5

  ssl {
    enabled = true
    verify = true
    ca_path = "/opt/consul-template/tls/server-CA.cert"
    cert = "/opt/consul-template/tls/consul-template.cert"
    key = "/opt/consul-template/tls/consul-template.key"
    server_name = "127.0.0.1"
  }
}

# MYSQL
template {
  source = "/opt/consul-template/templates/mysql/server-ca.pem.tpl"
  destination = "/opt/mysql/tls/server/server-ca.pem"
  perms = 0640
  command = "/opt/consul-template/templates/mysql/reload.sh"
  error_on_missing_key = true
  left_delimiter  = "[["
  right_delimiter = "]]"
}
template {
  source = "/opt/consul-template/templates/mysql/server-cert.pem.tpl"
  destination = "/opt/mysql/tls/server/server-cert.pem"
  perms = 0640
  command = "/opt/consul-template/templates/mysql/reload.sh"
  error_on_missing_key = true
  left_delimiter  = "[["
  right_delimiter = "]]"
}
template {
  source = "/opt/consul-template/templates/mysql/server-key.pem.tpl"
  destination = "/opt/mysql/tls/server/server-key.pem"
  perms = 0640
  command = "/opt/consul-template/templates/mysql/reload.sh"
  error_on_missing_key = true
  left_delimiter  = "[["
  right_delimiter = "]]"
}

$ cat /opt/consul-template/templates/mysql/reload.sh
#!/bin/bash

set -eo pipefail

STATUS=0

if [ -f '/opt/mysql/current/bin/mysql' -a -S '/tmp/mysql.sock' ];
then
    echo "ALTER INSTANCE RELOAD TLS;" | /opt/mysql/current/bin/mysql -u root -p'super-secure-password'
    STATUS=$?
fi

exit $STATUS

scor2k avatar Jan 31 '23 19:01 scor2k

Why not add sleep $((1 + $RANDOM % 360)); to your mysql/reload.sh command? (obviously you can adjust the 360 seconds to whatever suits your needs. Your certificates will be rotated before they expire so you do not have to update right away when you generate the new ones.

komapa avatar Feb 06 '23 04:02 komapa

Why not add sleep $((1 + $RANDOM % 360)); to your mysql/reload.sh command? (obviously you can adjust the 360 seconds to whatever suits your needs. Your certificates will be rotated before they expire so you do not have to update right away when you generate the new ones.

Thank you for your reply, @komapa. Yes, it's a possible solution, but it's can guarantee nothing. We did the same by adjusting every next node's TTL to one (hour, day), but it also won't protect us in the case of bad luck.

scor2k avatar Feb 09 '23 02:02 scor2k

I think you pretty much said this?

command = "consul lock -child-exit-code /some/consul/path/prefix /opt/consul-template/templates/mysql/reload.sh"

See: https://developer.hashicorp.com/consul/commands/lock

komapa avatar Feb 13 '23 21:02 komapa