hanko icon indicating copy to clipboard operation
hanko copied to clipboard

JWKS can reset every server restart

Open darrencrossley opened this issue 1 year ago • 3 comments

Checklist

  • [X] I could not find a solution in the existing issues or docs.
  • [X] I agree to follow this project's Code of Conduct.

Describe the bug

Posting here for information as much as a fix but we encountered a subtle bug in the way JWK tokens are checked when starting the server.

In our case we had wanted to rotate our secret key, replacing it with a new secret. As this was only a testing instance we simply removed the old JWK from the DB and updated the secrets.keys array with a new random string (we were actually moved from yml configuration to ENV var configuration as it's more convenient for us and it wasn't simple to get out our old key into the new array).

In doing this we thought we followed the spirit of the documentation, and after restarting Hanko received no error and a new JWK was generated and exposed on the wellknown endpoint.

Critically however we didn't reset the IDENTITY column on the JWKs table, at which point this logic caused a lot of headache:

https://github.com/teamhanko/hanko/blob/main/backend/crypto/jwk/manager.go#L39:

  // for every key we should check if a jwk with index exists and create one if not.
	for i := range keys {
		j, err := persister.Get(i + 1)
		if j == nil && err == nil {
			_, err := manager.GenerateKey()
			if err != nil {
				return nil, err
			}
		} else if err != nil {
			return nil, err
		}
	}

This fetches JWKs using the identity column of the db, assuming the order of the keys array remains the same and that the table starts at ID 1.

I think this will cause problems for several reasons:

  • people are going to edit the table like we did
  • configuration arrays are always going to get their orders mixed up (even when the documentation tells them to add items to the top of the configuration array as specified here: https://github.com/teamhanko/hanko/blob/main/backend/docs/Config.md
  • Simply removing old keys will also require people to re-index their keys (simply creating 3 keys and then expiring the oldest with ID1 also causes this same issue)

I'm not sure the right way to solve this (hence this long post) but I think storing some kind of hash of the original secret and using this as the ID would be the simplest way. But it's possible (not a crypto expert) this is dangerous for some subtle reason.

Another option could be a more managed process where only a single secret is specified and only used as an input to some random process. Key rotation and generation would then purely be a application concern and a CLI / api could be used to configure and manage rotation.

Failing that maybe a CLI command could be used to reindex and manage the removal?

Reproducing the bug

  1. Start the server with a valid configuration and secrets.keys set
  2. remove the key from the database without reseting the autoincrement jwks_id_seq in postgres
  3. restart the server with a new secrets.keys
  4. All will seem well and a new JWKS entry will be created critically with ID 2
  5. Restart the server again a new JWKS entry will be created with ID 3...

Logs

No response

Configuration

No response

Hanko Version

v0.5.0-0.7.0

OS Hanko Backend

Linux

OS Version Hanko Backend

No response

OS

None

OS Version

No response

Browser Version

No response

Environment

None

Additional Context

No response

darrencrossley avatar May 31 '23 17:05 darrencrossley

Hey Darren. Thanks for this in depth analysis! I think the initial way of thinking was that you don't remove anything out of the config but always add more keys to the array when rotating. But I can see that this is prone to error and the process can definitely be improved. I will take this into discussion with the team.

like-a-bause avatar Jun 01 '23 09:06 like-a-bause

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Jul 27 '23 02:07 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Aug 10 '23 02:08 github-actions[bot]