k8s.io
k8s.io copied to clipboard
Refactor infra/gcp/...
Right now it is set up as "concept-first". For example "ensure-staging" says "all of these are staging-like" and "ensure-prod" says "all of these are prod-like". That makes it hard to get a sense of what any one project has going on.
I propose to refactor it to "project-first". One list of projects and each one says "I am prod like" or "I am staging like". Then I could simply say ensure-project k8s-foo-bar
and all of the properties would be asserted.
This is very close to terraform territory, but I don't know TF well enough to make the "utility" functions to not be so duplicated. @cblecker - is this worth pursuing?
Yes, it is absolutely worth pursuing. The bash ensure stuff is getting out of hand IMO.
Shall we discuss the idea sometime and you can either volunteer yourself or write enough that we can solicit other volunteers?
On Mon, Dec 16, 2019 at 3:00 PM Christoph Blecker [email protected] wrote:
Yes, it is absolutely worth pursuing. The bash ensure stuff is getting out of hand IMO.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/k8s.io/issues/516?email_source=notifications&email_token=ABKWAVA273YNPZVGHHLBLQTQZAB7FA5CNFSM4J3S77C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHANOXI#issuecomment-566286173, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVAGRJBN2KMPTHEC4A3QZAB7FANCNFSM4J3S77CQ .
@thockin just taking a walk into the issues, maybe this is related: https://github.com/kubernetes/k8s.io/pull/523
I also saw the discussion into the mailing list, in my opinion Terraform is the best way to do this :)
I just don't think I have enough knowledge into that to help with the Terraform stuff, but anyway just putting the PR here (again) so we may have a follow up.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
/area cluster-mgmt /area cluster-infra /kind cleanup
Having dipped my toes into adding to this mess:
- I hated concept-first at first, and thought project-first was a great idea
- I'm now less sure, as some concept span projects (image promotion needs access to all these, prow/boskos will need access to all those)
- I suspect the parts of our shell scripts that are for loops could correspond well to terraform modules
- I just need to say out loud that I am still a little freaked out by terraform. It's this whole other ecosystem that has churn and will need to be kept current. I wasn't confident enough to try migrating the google provider from 2.x to 3.x for
aaa
without maybe accidentally blowing away the cluster. OTOH as we write more bash inlib*.sh
files we're also creating our own ecosystem with possibly inconsistent naming, lack of testing, etc. - I fell back to using a shell script for creating projects, but when I have time would be willing to see what rewriting that as terraform would be like
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen
. Mark the issue as fresh with/remove-lifecycle rotten
.Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-lifecycle rotten /lifecycle frozen /priority important-longterm I don't have the bandwidth for this, and this issue is maybe too broad to stay open, but I think the point stands that:
- continuing with our bash as-is will deepen our tech debt
- we aren't comfortable enough with our bash to allow automation to run it
Whatever we use, even if it's bash, we need:
- tests to enforce conventions
- tests that build trust in our ability to refactor
- tests that build trust in our ability to have automation run this
- confirmation of what changes will result
In an ideal world, we would have:
- ability to reconcile audit output with infra/gcp configs
- if audit reveals missing resources, create them
- if audit reveals unknown resources, suggest deletion or new configs to add
- automated deployment on PR merge (with reliable postsubmits / discoverable postsubmit results)
Even though I'm not a terraform native, to me this sounds really aligned with terraform:
- modules for organization / re-use
- plan / apply to build trust in automation
- terratest or something similar to enforce conventions
There might also be a middle ground where we want some common patterns described in yaml instead, e.g.
- staging-project -> results in kubernetes.io group, manifest file, gcp project, service accounts, iam permission changes, etc.
- public-app -> results in kubernetes.io group, aaa namespace, manifests, etc.
/reopen d'oh, forgot this critical step
@hasheddan had also discussed possibly demoing crossplane for us (slack ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1611757501019900)
@spiffxp: Reopened this issue.
In response to this:
/reopen d'oh, forgot this critical step
@hasheddan had also discussed possibly demoing crossplane for us (slack ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1611757501019900)
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@spiffxp preparing whenisgood as we speak :)
this is going to be fun, using kubernetes to manage the kubernetes infrastructure :D
:heart: to @ameukam for cross-linking PR's I've been contributing to refactor the bash in infra/gcp
https://github.com/kubernetes/k8s.io/pull/2188 takes a tentative step toward using YAML instead of hardcoded bash variables / arrays as the source of our configuration data
An update on the current state of the bash in infra/gcp.
Over the past few months, as I've worked to reconcile inconsistencies or unmanaged resources discovered via our automated audit PRs, I've been trying to nudge the bash in a consistent direction.
The principles I've tried to follow are:
- extract
lib_foo.sh
files for different GCP services, eg:lib_iam.sh
for IAM,lib_gsm.sh
for Google Secret Manager - try for some level of consistency in function naming:
-
ensure_[removed_]_{resource}
for creation/deletion of resources
-
- refactor
ensure-foo.sh
files:- pull everything into functions such that a
main
entrypoint at the bottom is responsible for kicking off execution- makes it easier to test specific parts of a script
- makes it easier to reuse other functions (vs. relying on order of definitions)
- write functions such that they can operate on a list of args (e.g.
enable_services foo bar baz
) - use arrays more often, and pass those arrays as lists of args
- less noise, and support for comments, when doing complicated multi-line things
- ability to dynamically set flags
- pull everything into functions such that a
- scope the set of resources a script manages such that less-privileged-than-org-admin roles could run these scripts
/milestone v1.23
/milestone v1.24