feat(testing): locksmith sa-keys-as-a-service
This PR addresses the need to provide an alternate identity in Cloud Build to make requests to private Cloud Run services. Every 28 days a new service account key is generated and pushed into Secret Manager for use by Cloud Build pipelines. For a more detailed understanding, review the README of the locksmith Cloud Function
TODOS
- [x] Decision: Is this automation worth the overhead? Yes.
- [x] Decision: Is this blocked on automating time and count based key expiration? No.
- [ ] TODO: Finish updating project setup script.
- [ ] TODO: Locksmith Automated Testing
- [ ] Locksmith Auto-deploy with CI
- [ ] Request: Security Review
Follow-ups
- [ ] New Cloud Function: Limit the number of service account keys to the two most recent
- [ ] Add alerting/buildcop on functions down
I don't mind all the pieces and I rather use the automated key rotation then manually rotate keys. The few questions I have:
- How/where will this break?
- Will a cold function create errors with rotating keys?
- Do we care about creating a Scheduler alert?
- Who/how many people will be managing this process? Can contributors retrigger key addition or only those will project access?
- Will we need to update Secret Manager over time?
- If the Cloud Build step fails will contributors know what went wrong and what is needed to rerun?
I don't mind all the pieces and I rather use the automated key rotation then manually rotate keys. The few questions I have:
How/where will this break?
If it does not trigger within a couple days of the schedule, the keys should expire. It is unclear to me whether we need to implement a separate "expiration" mechanism" or if the IAM system has something built-in to handle that.
Will a cold function create errors with rotating keys?
If the function does not trigger, the previous key is still in place until expiration. Once it expires gcloud auth will fail and so the build will fail.
Do we care about creating a Scheduler alert?
It's not clear to me whether we want a scheduler alert or an alert on the function, but I definitely think we want something that makes sure all automation is working as expected.
Who/how many people will be managing this process? Can contributors retrigger key addition or only those will project access?
My expectation is that the function is mostly hands off and reserved for maintainers. Any PR that's tested has the chance to expose the service account key by simply adding a cat statement to a build step, which is where our discussion on build security comes in.
Will we need to update Secret Manager over time?
Not sure I follow the question.
If the Cloud Build step fails will contributors know what went wrong and what is needed to rerun?
Not sure that this is different from any failure, once we have more experience with the error states we can refine the error output and add an FAQ to contributor's guide.
This is a good repo to test this out then. As for the secret manager question - once set up will there be any needs to update the secret name or other security related rotations/changes?
Is this ready for review?