Zero downtime password rotations
Rebase of #389
Problem
Sometimes passwords leak. Also sometimes security teams want infrastructure teams to rotate passwords. With Postgres, that's impossible currently without taking down the application or using third-party tools like Vault. If one was to change the password today, all new connections will be denied, causing a production incident.
Solution
This PR introduces the ability to use multiple passwords (called secrets) to connect to PgCat while one secret is being deprecated and replaced with the other. Each database <--> user <--> secret triplet gets their own connection pool (before, it was only database <--> user, like PgBouncer).
Creating separate pools is a good idea because it allows us to:
- Separate clients with old password from clients with new password in admin, so we can track the progression of the password rotation
- Forcibly disconnect clients that are using an old password by shutting down their pool.
Implementation caveats
All Postgres authentication mechanisms except plain text obfuscate the secret (password) being used, so without knowing more, we need to test all configured passwords. Additionally, we can't (I think) come up with a unique pool identifier using a hashed password, since the hashing has to be deterministic, which defeats the purpose of password hashing (they are random, e.g. md5 creates a different hash every time because of random salt).
So, for this feature to work, we need to use plain text authentication. Of course that will set off all kinds of alarm flags with most people, since this method is not secure by itself (neither is MD5, but that's out of scope at the moment). So, we only allow this mechanism to work if PgCat is configured to use TLS connections. Using TLS and plain text passwords together is safe and used everywhere across the Internet today. If it's good enough for the banks, it's good enough for us.
Postgres docs on plain auth: https://www.postgresql.org/docs/15/auth-password.html
Changes
pgcat.toml
Additional secrets = [ "one", "two", "three" ] option is added to [users] section. This configures multiple passwords (and pools) for the user. The password option is used to connect to Postgres.
admin db
An additional secret column is added (redacted) to differentiate pool statistics.
pgcat=> show users;
name | pool_mode | secret
---------------+-------------+-------------
simple_user | session | <no secret>
sharding_user | transaction | ****_one
sharding_user | transaction | ****_two
sharding_user | transaction | <no secret>
other_user | transaction | <no secret>
(5 rows)
pgcat=> show pools;
database | user | secret | pool_mode | cl_idle | cl_active | cl_waiting | cl_cancel_req | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us
------------+---------------+-------------+-------------+---------+-----------+------------+---------------+-----------+---------+---------+-----------+----------+---------+------------
sharded_db | sharding_user | ****_two | transaction | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0
sharded_db | sharding_user | <no secret> | transaction | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0
sharded_db | sharding_user | ****_one | transaction | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0
sharded_db | other_user | <no secret> | transaction | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0
simple_db | simple_user | <no secret> | session | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0
(5 rows)
Ops
To use this feature:
- Add new secret to
secretsfor the user, reload the config. - Change the password in all apps and redeploy.
- Wait for deploy to finish, remove old secret from
secrets, reload the config. - In quick succession: a)
ALTER ROLE ...in Postgres to change the password, b) changepasswordin config and reload.
Step 4 can be done with 0 errors if min_size for the pool is set to max_size, opening all connections in advance. This ensures no new connection to Postgres is made during step 4. Existing connections using the old password are not affected by ALTER ROLE.
In postgres zero downtime password rotations can be implemented by using 2 users that are both part of the same group:
- Clients use user 1
- Clients start using only user 2
- password of user 1 is changed
- Clients start using only user 1
- password of user 2 is changed
It's not very user friendly, but it's quite possible.
Hey, is this still being worked on?