gardener icon indicating copy to clipboard operation
gardener copied to clipboard

Refactor Gardener components to native controller-runtime components

Open rfranzke opened this issue 4 years ago • 8 comments

How to categorize this issue?

/area open-source dev-productivity scalability /kind cleanup technical-debt

What would you like to be added:

  • gardener-scheduler, gardener-controller-manager and gardenlet should be refactored such that they become native controller-runtime components.
  • gardener-resource-manager, gardener-admission-controller, gardener-seed-admission-controller are already native controller-runtime components.
  • gardener-apiserver is special anyways because it's neither a controller nor a webhook server.

Why is this needed:

  • Less code
  • Less duplication/hand-crafted logic
  • Easier maintenance
  • Metrics out-of-the-box
  • Streamlined logging

Steps

  • all controllers should implement the reconcile.Reconciler interface
    • [x] #4503
    • [x] https://github.com/gardener/gardener/pull/4891
  • remove current dummy controller metrics
    • [x] ... in gardener-scheduler: https://github.com/gardener/gardener/pull/4320
    • [x] ... in gardener-controller-manager, gardenlet: https://github.com/gardener/gardener/pull/4913
  • [x] check whether we can remove active worker counting in our controllers (dummy graceful termination)
  • streamline logging – replace logrus with logr
    • [x] https://github.com/gardener/gardener/pull/4925
    • [x] https://github.com/gardener/gardener/pull/5175
    • [x] add logcheck tool to ensure even number of args for Info, Error and WithValues calls, etc.: https://github.com/gardener/gardener/pull/5442
    • scheduler
      • [x] https://github.com/gardener/gardener/pull/4320
    • controller-manager
      • [x] inject logr logger into reconcile context to make it retrievable by logf.FromContext(ctx): https://github.com/gardener/gardener/pull/5057
      • [x] https://github.com/gardener/gardener/pull/5268
      • [x] https://github.com/gardener/gardener/pull/5290
      • [x] https://github.com/gardener/gardener/pull/5488
      • [x] https://github.com/gardener/gardener/pull/6253
    • gardenlet
      • [x] https://github.com/gardener/gardener/pull/6259
      • [x] https://github.com/gardener/gardener/pull/6260
      • [x] https://github.com/gardener/gardener/pull/6264
      • [x] https://github.com/gardener/gardener/pull/6265
      • [x] https://github.com/gardener/gardener/pull/6266
      • [x] https://github.com/gardener/gardener/pull/6267
      • [x] https://github.com/gardener/gardener/pull/6278
      • [x] https://github.com/gardener/gardener/pull/6282
    • ./test/... packages
      • [x] https://github.com/gardener/gardener/pull/6316
    • [ ] https://github.com/gardener/gardener/issues/5191
    • [x] https://github.com/gardener/gardener/pull/6332
    • [x] https://github.com/gardener/gardener/pull/6380
  • introduce c-r Manager, including leader election, client/cache creation, enhance healthz/readyz endpoints to include checks for informer/watch healthiness
    • controller-manager
      • [x] https://github.com/gardener/gardener/pull/6333
    • scheduler
      • [x] https://github.com/gardener/gardener/pull/4320
    • gardenlet
      • [x] https://github.com/gardener/gardener/pull/6688
  • add native c-r controllers to manager including predicates, event handlers, etc. as needed (can be done step by step): ⚠️ use predicate.GenerationChangedPredicate instead of comparing metadata.generation and status.observedGeneration and stop returning nil after adding finalizers in order to prevent duplicated reconciliations (basically implement option b) from https://github.com/gardener/gardener/issues/4730 where it is not done yet)
    • controller-manager
      • [x] @rfranzke: https://github.com/gardener/gardener/pull/6358
      • [x] @ary1992: https://github.com/gardener/gardener/pull/6615
      • [x] @timebertt: https://github.com/gardener/gardener/pull/6333
        • [x] @rfranzke: https://github.com/gardener/gardener/pull/6360
      • [x] @ary1992: https://github.com/gardener/gardener/pull/6462
      • [x] @rfranzke: https://github.com/gardener/gardener/pull/6629
      • [x] @ary1992: https://github.com/gardener/gardener/pull/6550
      • [x] @ary1992: https://github.com/gardener/gardener/pull/6423
      • [ ] @ary1992: Switch ManagedSeedSet controller to controller-runtime
      • [x] @acumino: https://github.com/gardener/gardener/issues/6348
      • [x] @timebertt: https://github.com/gardener/gardener/pull/6645
        • [x] @rfranzke: https://github.com/gardener/gardener/pull/6667
      • [x] @ary1992: https://github.com/gardener/gardener/pull/6391
      • [x] @rfranzke: https://github.com/gardener/gardener/pull/6621
      • [x] @rfranzke: https://github.com/gardener/gardener/pull/6636
      • [x] @rfranzke: https://github.com/gardener/gardener/pull/6620
    • scheduler
      • [x] https://github.com/gardener/gardener/pull/4320
    • gardenlet (introduce filtered cache with cache.Options.SelectorsByObject for watched objects)
      • [ ] @ary1992: Switch BackupBucket controller to controller-runtime
      • [ ] @ary1992: Switch BackupEntry controller to controller-runtime
      • [ ] @ary1992: Switch Bastion controller to controller-runtime
      • [ ] @rfranzke: https://github.com/gardener/gardener/pull/6733
      • [ ] @rfranzke: Switch Extensions controller to controller-runtime
        • replace custom retry logic on seed bootstrap for waiting until we get informers for our CRs
      • [ ] @rfranzke: Switch ManagedSeed controller to controller-runtime
      • [ ] @ary1992: Switch NetworkPolicy controller to controller-runtime
      • [ ] @rfranzke: Switch Seed controller to controller-runtime
      • [ ] @rfranzke: Switch Shoot controller to controller-runtime
      • [ ] @ary1992: Switch ShootSecret controller to controller-runtime
  • expose controller-runtime metrics (related to #2815)
    • [x] https://github.com/gardener/gardener-extension-provider-aws/pull/428 (gives a small preview at what metrics we can get from that)
    • [x] ... in gardener-scheduler: https://github.com/gardener/gardener/pull/4320
    • [x] ... in gardener-controller-manager: https://github.com/gardener/gardener/pull/6333
    • [x] ... in gardenlet: https://github.com/gardener/gardener/pull/6688
    • [ ] adapt seed-prometheus scrape config to allow scraping with http scheme: https://github.com/gardener/gardener/pull/4706#discussion_r713648371
    • [ ] add grafana dashboards to seed
  • [ ] https://github.com/gardener/gardener/issues/2829
  • [x] https://github.com/gardener/gardener/issues/4605
  • [ ] add timeout for initial cache syncs in controller/component constructors (similar to https://github.com/gardener/gardener-resource-manager/pull/102) or make sure that c-r does this out of the box

rfranzke avatar Jun 23 '21 14:06 rfranzke

All controllers should implement the reconcile.Reconciler interface (already done for GSCHED, GCM)

I'm currently working on this for the remaining gardenlet controllers as part of https://github.com/gardener/gardener/issues/2822

timebertt avatar Aug 11 '21 07:08 timebertt

All controllers should implement the reconcile.Reconciler interface (already done for GSCHED, GCM)

I'm currently working on this for the remaining gardenlet controllers as part of #2822

It's mostly done with https://github.com/gardener/gardener/pull/4503. Though, the shoot controller should be further refactored into individual reconcile.Reconcilers instead of using reconcile.Funcs.

timebertt avatar Aug 12 '21 15:08 timebertt

/assign @timebertt /roadmap internal /milestone 2021-Q4

rfranzke avatar Sep 29 '21 05:09 rfranzke

/in-progress

timebertt avatar Oct 21 '21 08:10 timebertt

refactor remaining controllers to reconcile.Reconciler interface (extensions controller: controllerinstallation/shootstate)

I tried to accomplish this, but I faced a few difficulties with the current implementation/architecture of the package:

  • the controller actually contains 2 controllers: a controllerinstallation controller and a shootstate controller, but both controllers are initialized and built from the shared artifacts map
  • the controllerinstallation reconcilers share a common state (kindToRequiredTypes) which is not so easy to do if they are split into dedicated structs

Given, that this controller package anyways needs significant restructuring work (actually it almost needs to be rebuilt from scratch) in order to make it work with native c-r constructs, I decided to not refactor this right now to reconcile.Reconciler. This provides only little to no value right now over reconcile.Func, as it cannot be simply added as a native controller to a c-r manager, so it needs to be touched again anyways. Once we are able to leverage all the c-r goodies to get rid of all our custom coding, we can rebuild it in a clean way.

So this will be done in a dedicated PR as part of step

add native c-r controllers to manager including predicates, event handlers, etc. as needed (can be done step by step)

timebertt avatar Oct 25 '21 16:10 timebertt

check whether we can remove active worker counting in our controllers (dummy graceful termination)

I look into this and played around with it a little bit. The result:

  • current implementation doesn't prevent the process from exiting, if items are left in the queue or workers have not finished yet
  • too much effort to fix current implementation
  • could remove it now in favor of graceful termination feature, that we will get with native c-r controllers
  • but it also doesn't hurt, so let's keep it for now and simply remove it together in individual refactorings of controllers

Hence, I marked the step as completed for now. cc @rfranzke

timebertt avatar Oct 27 '21 12:10 timebertt

/assign @rfranzke @timebertt

timebertt avatar Jun 30 '22 06:06 timebertt

/assign @ary1992 for some of the controller refactorings for controller-manager, see above

rfranzke avatar Jul 20 '22 08:07 rfranzke

The Gardener project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close

/lifecycle stale

gardener-ci-robot avatar Mar 22 '23 14:03 gardener-ci-robot

/remove-lifecycle stale

rfranzke avatar Mar 22 '23 15:03 rfranzke

Now that #7147 is merged, we can close this issue, correct? :)

timebertt avatar Mar 29 '23 06:03 timebertt

Yes 🚀 /close

shafeeqes avatar Mar 29 '23 06:03 shafeeqes

@shafeeqes: Closing this issue.

In response to this:

Yes 🚀 /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gardener-prow[bot] avatar Mar 29 '23 06:03 gardener-prow[bot]