operator-controller icon indicating copy to clipboard operation
operator-controller copied to clipboard

Add ability for Subscription config to configure a specific container in the operator pod

Open dkwon17 opened this issue 1 year ago • 7 comments

Today, we can use Subscription Config to conifgure resource limits and requests for the operator pod: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources

However, the limit and request is applied to all container of the pod. It looks like it's not possible to configure a specific container of the pod with the subscription config.

Proposal

My proposal is to add functionality to allow configuration for each container by introducing config.containers which can override the default pod spec:

config:
  containers:
    - name: <container I want to configure>
      resources:
        limits:
          memory: "2Gi"

Use case

Our operator pod contains two containers:

  • controller
  • kube-rbac-proxy for authorizing requests

We would like to configure the memory limit for only the controller container and not the kube-rbac-proxy.

dkwon17 avatar Nov 28 '24 19:11 dkwon17

Hi @dkwon17,

An Extension can have 1 to many containers, and any RFE (Request for Enhancement) would not inherently be specific to individual container names like (controller or kube-rbac-proxy) since it would not fit all use cases and users. While the idea is interesting, it raises the question of how we would configure resources for 1 to many containers via a subscription.

One potential solution for this use case could be for the Content/Operator Author to create an API, such as a Config CRD. Then, cluster users could define the resource configurations like memory and CPU requests on its CR, and the solution would have the ConfigController watch its CR and update the containers on the cluster based on the info provided.

What do you think of this approach? Would not that address your need Did you try to do this already?

camilamacedo86 avatar Dec 23 '24 17:12 camilamacedo86

Sorry for the late comment, just catching up from the holiday break.

I'm not sure I see how the above response rules out OLM doing this? The proposed alternative puts a lot of burden on operator authors to implement their own solution and we'd certainly see a mix of approaches.

The proposal calls for applying limits based on container names, which are unique, and even if the cluster extension manages several pods with same named containers, applying things to them all is consistent with k8s resource config.

The proposal seems valid to me.

bentito avatar Jan 08 '25 22:01 bentito

Hi @bentito,

Thank you for checking this one. I also give a second though on this, and let's try to summarize.

What is the RFE:

In the context of OLM v1 (operator-controller), I understand that the RFE is about enhancing the functionality of ClusterExtensions. Specifically, the RFE proposes a change to the API to include an option for adding configurations like the following:

spec:
  podOverrides:
    containers:
      - name: my-container
        resources:
          limits:
            memory: "2Gi"
      - name: my-container-2
        resources:
          limits:
            memory: "2Gi"
      - name: my-container-3
        resources:
          limits:
            memory: "2Gi"

The idea is for OLM to watch/observe all Pods on the cluster, identify those that match the specified container names, and apply the defined resource configurations (podOverrides) to them.


Key Questions

  1. Should OLM only manage resource configurations for Pods defined in the CSV bundle, or for any Pod related to the solution?

    • If restricted to the CSV bundle:

      a) Would this fully address the requirement? ClusterExtensions users may need the ability to adjust resource configurations for any Pod related to the solution—not just those defined in the CSV bundle. Therefore, it seems that OperatorAuthors/Content Authors would still need to have the same burner you described above, and the RFE does not solve the problem.

      b) If support is extended to other formats (e.g., HelmChart), how would we delimit which Pods can or cannot be overwritten by OLM?

  2. If OLM allows PodOverrides for ANY Pod installed/managed via the ClusterExtension:

    • This approach introduces new challenges: a) Overstepping Responsibilities: The Operator/HelmChart is responsible for managing the resources it creates—not OLM.
      b) Risk of Race Conditions: Both OLM and the solution could attempt to manage the same Pods, leading to potential conflicts and instability.

Let me know your thoughts!

Best regards,

camilamacedo86 avatar Jan 09 '25 06:01 camilamacedo86

@dkwon17 if you could weigh in, clarifying between 1 or 2 in what @camilamacedo86 is asking?

bentito avatar Jan 13 '25 19:01 bentito

@camilamacedo86 @bentito sorry for the delay, I think what I had in mind was 1: restricting to the CSV bundle, since the operator pod (the pod that runs the controller) is what I would like to configure.

One potential solution for this use case could be for the Content/Operator Author to create an API, such as a Config CRD. Then, cluster users could define the resource configurations like memory and CPU requests on its CR, and the solution would have the ConfigController watch its CR and update the containers on the cluster based on the info provided.

What I want to achieve is to have a way to control the resource limits of specific containers in the pod that that runs the ConfigController. I don't think this is possible for the ConfigController to watch the CR and update its own pod? If it is possible, do you have a small example?

dkwon17 avatar Jan 13 '25 20:01 dkwon17

I implemented a proof of concept as part of https://github.com/operator-framework/operator-controller/pull/1418. We should revisit that PR when we have time to pick this issue up.

joelanford avatar Jan 21 '25 14:01 joelanford

@joelanford Hey, I'm curious about which part of your POC contains the changes to make this work. Is there a follow up Issue that tracks implementing it?

kubasikus avatar Apr 07 '25 12:04 kubasikus

Issues go stale after 90 days of inactivity. If there is no further activity, the issue will be closed in another 30 days.

github-actions[bot] avatar Jul 07 '25 01:07 github-actions[bot]

This issue has been closed due to inactivity.

github-actions[bot] avatar Aug 06 '25 01:08 github-actions[bot]