k8s.io Refactor infra/gcp/...

Right now it is set up as "concept-first". For example "ensure-staging" says "all of these are staging-like" and "ensure-prod" says "all of these are prod-like". That makes it hard to get a sense of what any one project has going on.

I propose to refactor it to "project-first". One list of projects and each one says "I am prod like" or "I am staging like". Then I could simply say ensure-project k8s-foo-bar and all of the properties would be asserted.

This is very close to terraform territory, but I don't know TF well enough to make the "utility" functions to not be so duplicated. @cblecker - is this worth pursuing?

Dec 16 '19 22:12 thockin

Yes, it is absolutely worth pursuing. The bash ensure stuff is getting out of hand IMO.

Dec 16 '19 23:12 cblecker

Shall we discuss the idea sometime and you can either volunteer yourself or write enough that we can solicit other volunteers?

On Mon, Dec 16, 2019 at 3:00 PM Christoph Blecker [email protected] wrote:

Yes, it is absolutely worth pursuing. The bash ensure stuff is getting out of hand IMO.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/k8s.io/issues/516?email_source=notifications&email_token=ABKWAVA273YNPZVGHHLBLQTQZAB7FA5CNFSM4J3S77C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHANOXI#issuecomment-566286173, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVAGRJBN2KMPTHEC4A3QZAB7FANCNFSM4J3S77CQ .

Dec 16 '19 23:12 thockin

@thockin just taking a walk into the issues, maybe this is related: https://github.com/kubernetes/k8s.io/pull/523

I also saw the discussion into the mailing list, in my opinion Terraform is the best way to do this :)

I just don't think I have enough knowledge into that to help with the Terraform stuff, but anyway just putting the PR here (again) so we may have a follow up.

Jan 09 '20 03:01 rikatz

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Apr 08 '20 03:04 fejta-bot

/remove-lifecycle stale

Apr 08 '20 03:04 bartsmykla

/area cluster-mgmt /area cluster-infra /kind cleanup

Apr 15 '20 19:04 spiffxp

Having dipped my toes into adding to this mess:

I hated concept-first at first, and thought project-first was a great idea
I'm now less sure, as some concept span projects (image promotion needs access to all these, prow/boskos will need access to all those)
I suspect the parts of our shell scripts that are for loops could correspond well to terraform modules
I just need to say out loud that I am still a little freaked out by terraform. It's this whole other ecosystem that has churn and will need to be kept current. I wasn't confident enough to try migrating the google provider from 2.x to 3.x for aaa without maybe accidentally blowing away the cluster. OTOH as we write more bash in lib*.sh files we're also creating our own ecosystem with possibly inconsistent naming, lack of testing, etc.
I fell back to using a shell script for creating projects, but when I have time would be willing to see what rewriting that as terraform would be like

Apr 28 '20 17:04 spiffxp

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Jul 27 '20 18:07 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

Aug 26 '20 19:08 fejta-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Sep 25 '20 20:09 fejta-bot

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sep 25 '20 20:09 k8s-ci-robot

/remove-lifecycle rotten /lifecycle frozen /priority important-longterm I don't have the bandwidth for this, and this issue is maybe too broad to stay open, but I think the point stands that:

continuing with our bash as-is will deepen our tech debt
we aren't comfortable enough with our bash to allow automation to run it

Whatever we use, even if it's bash, we need:

tests to enforce conventions
tests that build trust in our ability to refactor
tests that build trust in our ability to have automation run this
confirmation of what changes will result

In an ideal world, we would have:

ability to reconcile audit output with infra/gcp configs
- if audit reveals missing resources, create them
- if audit reveals unknown resources, suggest deletion or new configs to add
automated deployment on PR merge (with reliable postsubmits / discoverable postsubmit results)

Even though I'm not a terraform native, to me this sounds really aligned with terraform:

modules for organization / re-use
plan / apply to build trust in automation
terratest or something similar to enforce conventions

There might also be a middle ground where we want some common patterns described in yaml instead, e.g.

staging-project -> results in kubernetes.io group, manifest file, gcp project, service accounts, iam permission changes, etc.
public-app -> results in kubernetes.io group, aaa namespace, manifests, etc.

Jan 23 '21 18:01 spiffxp

/reopen d'oh, forgot this critical step

@hasheddan had also discussed possibly demoing crossplane for us (slack ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1611757501019900)

Feb 08 '21 17:02 spiffxp

@spiffxp: Reopened this issue.

In response to this:

/reopen d'oh, forgot this critical step

@hasheddan had also discussed possibly demoing crossplane for us (slack ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1611757501019900)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 08 '21 17:02 k8s-ci-robot

@spiffxp preparing whenisgood as we speak :)

Feb 08 '21 17:02 hasheddan

this is going to be fun, using kubernetes to manage the kubernetes infrastructure :D

Feb 08 '21 17:02 rikatz

:heart: to @ameukam for cross-linking PR's I've been contributing to refactor the bash in infra/gcp

https://github.com/kubernetes/k8s.io/pull/2188 takes a tentative step toward using YAML instead of hardcoded bash variables / arrays as the source of our configuration data

Jun 11 '21 22:06 spiffxp

An update on the current state of the bash in infra/gcp.

Over the past few months, as I've worked to reconcile inconsistencies or unmanaged resources discovered via our automated audit PRs, I've been trying to nudge the bash in a consistent direction.

The principles I've tried to follow are:

extract lib_foo.sh files for different GCP services, eg: lib_iam.sh for IAM, lib_gsm.sh for Google Secret Manager
try for some level of consistency in function naming:
- ensure_[removed_]_{resource} for creation/deletion of resources
refactor ensure-foo.sh files:
- pull everything into functions such that a main entrypoint at the bottom is responsible for kicking off execution
  - makes it easier to test specific parts of a script
  - makes it easier to reuse other functions (vs. relying on order of definitions)
- write functions such that they can operate on a list of args (e.g. enable_services foo bar baz)
- use arrays more often, and pass those arrays as lists of args
  - less noise, and support for comments, when doing complicated multi-line things
  - ability to dynamically set flags
scope the set of resources a script manages such that less-privileged-than-org-admin roles could run these scripts

Jun 11 '21 22:06 spiffxp

/milestone v1.23

Sep 02 '21 19:09 spiffxp

/milestone v1.24

Dec 14 '21 22:12 ameukam

k8s.io k8s.io copied to clipboard

Refactor infra/gcp/...

k8s.io
k8s.io copied to clipboard