operator-lifecycle-manager icon indicating copy to clipboard operation
operator-lifecycle-manager copied to clipboard

operator subscription does not respect config.selector when it applies resource limits to the operator pods

Open xiangjingli opened this issue 4 years ago • 6 comments

Bug Report

There are three pods in our multi- cluster subscription community operator https://github.com/operator-framework/community-operators/tree/master/community-operators/multicluster-operators-subscription

Though a selector is set to match 2 of 3 pods in the operator subscription CR, the resource limit is applied to all of the 3 pods.

To reproduce it:

  1. go to openshift operator Hub web console, install the community operator Multicluster Subscription Operator V0.2.3 in a namespace. e.g. open-cluster-management

  2. Make sure the 3 pods with different labels are installed. Also no resource limit is applied to the 3 pods.

% oc get pods -l "app in (multicluster-operators-standalone-subscription, multicluster-operators-hub-subscription, multicluster-operators-application)"
NAME                                                              READY   STATUS    RESTARTS   AGE
multicluster-operators-application-585d76cd-9p5ht                 4/4     Running   0          7m32s
multicluster-operators-hub-subscription-84b776d654-pmmjp          1/1     Running   0          7m32s
multicluster-operators-standalone-subscription-868b55fdbc-dmsct   1/1     Running   0          7m32s

% oc get pods multicluster-operators-standalone-subscription-7c8cbf885f-tjpb8 -o yaml 
apiVersion: v1
kind: Pod
spec:
  containers:
  - image: quay.io/open-cluster-management/multicluster-operators-subscription:community-2.1
    resources: {}

status:
  qosClass: BestEffort
  1. Apply the new operator subscription CR,
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: multicluster-operators-subscription
  namespace: open-cluster-management
spec:
  channel: release-2.1
  config:
    selector:
      matchExpressions:
      - key: app 
        operator: In
        values:
        - multicluster-operators-standalone-subscription
        - multicluster-operators-hub-subscription
    resources: 
      limits:
        cpu: 750m
        memory: 2Gi
      requests:
        cpu: 150m
        memory: 128Mi
  installPlanApproval: Automatic
  name: multicluster-operators-subscription
  source: community-operators
  sourceNamespace: openshift-marketplace
  startingCSV: multicluster-operators-subscription.v0.2.3
  1. Expected result.

The resource limit is applied to the two pods - multicluster-operators-standalone-subscription and multicluster-operators-hub-subscription

The pod multicluster-operators-application should have no resource limit

  1. Actual result

The resource limit is applied to all the three pods, including multicluster-operators-application

% oc get pods multicluster-operators-application-585d76cd-96drh -o yaml 
apiVersion: v1
kind: Pod
spec:
  containers:
  - image: quay.io/open-cluster-management/multicluster-operators-placementrule:community-2.1
    resources:
      limits:
        cpu: 750m
        memory: 2Gi
      requests:
        cpu: 150m
        memory: 128Mi
...
status:
  qosClass: Burstable

Environment OCP V4.5.6 built-in OLM

Suggested Solution

  1. In our multicluster subscription operator CSV, we have defined different resource limits for each container in each pod. It seems to be ignored by OLM. Is there a specific reason for that?

It would be great if OLM could support to set container resource limits through operator CSV directly, which is consistent with k8s deployment api specification.

  1. If OLM only hopes to set resource limit though operator subscription CR, one generic solution would be to support multiple pod selectors, and resource is defined under each selector. As a result, different resource limit/request can be applied to all containers in different pods decided by the pods selector.

xiangjingli avatar Oct 06 '20 03:10 xiangjingli

any update?

xiangjingli avatar Oct 20 '20 15:10 xiangjingli

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 19 '20 17:12 stale[bot]

It looks like this is still reproducible when I modify the Subscription you pasted above on a kind cluster. What's strange is subscription.Spec.Config.Selector is discoverable when firing off kubectl explain subscription.spec.config commands, but I'm unable to find anywhere we document the usage of this field, like we do with the other configuration-fields listed in https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md.

It's unclear to me right now whether this kind of use case is something we'd need to support and treat as a bug, but I haven't been able to find any documentation mentioning this field outside of API discovery, and it's possible that the Subscription API wouldn't exist in any new v2 OLM API designs.

cc @dmesser

timflannagan avatar Feb 11 '22 21:02 timflannagan

I came here looking for a way to configure the resource requests and limits for individual deployments in the cert-manager OLM package which has three deployments: cert-manager, cert-manager-webhook, and cainjector.

If this is not possible, could the subscription-config document at least be updated to include this known limitation

https://github.com/operator-framework/api/blob/33310d6154f344da5494c788451bd34cdf54711c/pkg/operators/v1alpha1/subscription_types.go#L43-L47

// Selector is the label selector for pods to be configured. // Existing ReplicaSets whose pods are // selected by this will be the ones affected by this deployment. // It must match the pod template's labels.

xref: https://github.com/cert-manager/website/pull/983

wallrj avatar May 19 '22 10:05 wallrj

@timflannagan Out of interest, where are the "v2 OLM API designs". It'd be interesting to link to these from our documentation to preempt some of the questions we get about configuring the cert-manager OLM package:

  • https://cert-manager.io/docs/installation/operator-lifecycle-manager/#configuration

wallrj avatar May 20 '22 09:05 wallrj

@wallrj I don't think we have anything in a central location, but I'd recommend poking at a strawman that aims to outline the newly proposed APIs/systems in OLM v2 for an overview.

We've been iterating on the lower level components, like https://github.com/operator-framework/rukpak quite a bit recently, and that project is getting close to a state that provides value by itself, and has the primitives present that higher level components can utilize. That project has support for installing and reconciling arbitrary Kubernetes resources, so if that sounds interesting to you, I'd also recommend checking out that project and playing around with it.

timflannagan avatar May 20 '22 15:05 timflannagan