operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

Designing lean operators

Open fgiloux opened this issue 4 years ago • 18 comments

What is the URL of the document?

https://sdk.operatorframework.io/docs/best-practices/common-recommendation/

Which section(s) is the issue in?

This is about adding a new section

What needs fixing?

One of the pitfalls that many operators are failing into is that they watch resources with high cardinality like secrets possibly in all namespaces. Besides the security aspect, which may be worth mentioning this has a massive impact on the memory used by the controller on big clusters. Such resources should either be filtered by label or possibly avoided by for instance using a custom resource instead.

Also it may be worth mentioning ResyncPeriod. Setting it to low would also cause the unnecessary consumption of CPU cycles.

Additional context

This is a follow up of: https://github.com/operator-framework/operator-sdk/pull/5353/files#r753112161

fgiloux avatar Nov 21 '21 11:11 fgiloux

This sounds like a great idea to add to the best practices section in the docs. Would you be interested in contributing this?

jberkhahn avatar Nov 22 '21 19:11 jberkhahn

Related info on the c-r: https://github.com/kubernetes-sigs/controller-runtime/blob/master/designs/use-selectors-at-cache.md

tlwu2013 avatar Nov 22 '21 19:11 tlwu2013

@fgiloux just wanted to follow up on this issue. Would you like to contribute on this, or have any suggestions/tips which we could add in the documentation.

varshaprasad96 avatar Nov 29 '21 19:11 varshaprasad96

@varshaprasad96 yes I would like to contribute. I have however a couple of things higher on my priority list right now, so that I am also ok with another person taking the lead on the subject if someone happens to have more free cycles. For providing good guidance I would need to experiment, which requires time.

The link shared by @tlwu2013 is really to the point.

fgiloux avatar Nov 29 '21 19:11 fgiloux

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Feb 27 '22 23:02 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Mar 29 '22 23:03 openshift-bot

/remove-lifecycle rotten

fgiloux avatar Mar 30 '22 09:03 fgiloux

/remove-lifecycle stale

camilamacedo86 avatar Mar 30 '22 17:03 camilamacedo86

/remove-lifecycle rotten

fgiloux avatar Mar 31 '22 07:03 fgiloux

/lifecycle frozen

asmacdo avatar May 09 '22 16:05 asmacdo

Based on the discussion, the required task for this issue to include docs on how to filter caches using labels in controller-runtime. The original design doc is here: https://github.com/kubernetes-sigs/controller-runtime/blob/master/designs/use-selectors-at-cache.md

varshaprasad96 avatar Oct 12 '22 18:10 varshaprasad96

This slide from Kubecon may be helpful

image

asmacdo avatar Oct 31 '22 19:10 asmacdo

I've taken this as assigned on behalf of @rocrisp who's not yet a member of the org.

asmacdo avatar Nov 02 '22 14:11 asmacdo

Hey @rocrisp,

We are experiencing high memory usage and are hoping that this guide helps resolve that. I've read https://sdk.operatorframework.io/docs/best-practices/designing-lean-operators/ and I'm bit unclear on a few points.

My use case is an operator with multiple controllers for several Custom Resources (Environments and Sites) that are in multiple namespaces across the cluster.

  • Controller A is configured For(Environments) with Owns(Secrets) and Owns(ConfigMaps)
  • Controller B is configured For(Sites) with Owns(Secrets) and Owns(ConfigMaps)

The guide says we can override the default cache with NewCache and the example shows &corev1.Secret{}

  • Does the new cache behave like the default cache but with an override for secrets or does it only cache secrets and the developer must add every resource the controller watches?
  • The Secrets and Configmaps mentioned in my example above have ownerReferences but it looks like only Labels and Fields are supported. Is the recommendation to add a label to all Secrets and Configmaps owned by the operator or is there a way to use the ownerReference with cache.SelectorsByObject?

And thanks for the guide! It's very helpful ⭐

blkperl avatar Jan 20 '23 01:01 blkperl

@blkperl Yes, the new cache behaves like the default cache but with an override for secrets per labels you create. The watch is in the controller code so not the same as the filter code in main.go when you setup a newManager. In essence, the controller and the filter filters are two different things. no? In theory, you should be able to add ownerRefence to the secret label. Thank you for your feedback on the doc and with your help we can update the doc to make it more clear.

rocrisp avatar Jan 20 '23 14:01 rocrisp

@asmacdo Can you shed more light on the question above? Thank you.

rocrisp avatar Jan 20 '23 14:01 rocrisp

I meant utilizing the existing ownerReference as a field rather than copying it to a label. It's tricky though because it doesn't seem queryable in golang.

k get secret env-config -o json | jq '.metadata.ownerReferences[0].kind'
"Environment"

The approach that I think we might take is using labels.NewRequirement since we have an existing label with an internal id and we don't care about the value.

	envIdReq, err := labels.NewRequirement("environment-id", selection.Exists, []string{})
	if err != nil {
		// handle error
	}
	envIdSelector := labels.NewSelector()
	envIdSelector = envIdSelector.Add(*envIdReq)
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
        // other options ....
	NewCache: cache.BuilderWithOptions(cache.Options{
			SelectorsByObject: cache.SelectorsByObject{
				&corev1.ConfigMap{}: {
					Label: envIdSelector,
				},
			},
	}),
})

blkperl avatar Jan 20 '23 22:01 blkperl

@blkperl this approach looks right to me. If it doesn't work come by the channel and someone will help.

If it works, please consider adding anything to the docs that would have helped you :)

asmacdo avatar Jan 21 '23 22:01 asmacdo