operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

Scorecard shows warnings from k8s API response when Operator making requests to deprecated REST APIs

Open camilamacedo86 opened this issue 4 years ago • 4 comments

Feature Request

Describe the problem you need a feature to resolve.

I'd like to see in the default Scorecard tests results about the deprecation/removal of APIs in the next versions.

Goal

The goal is to check if Scorecard could gathering the info that is already provided by Kubenertes API when the operator is running on the cluster does request using deprecated APIs.

No Goal
  • Implement any logic which will check if a deprecated API is or is not used by the operator. It is already done by the k8s api
  • Implement any logic that will lint the files/manifests. Statics checks on the manifests are already addressed via the OperatorHub validator on the bundle validate command.
  • Implement a new feature that allows performing tests on the Operator running on the cluster. That is addressed by Scorecard already.
Motivation

The Scorecard is the SDK feature that is capable of running tests on the cluster. In the same way that is recommended to perform regression tests on the projects and then, check its events/metrics to see if any warnings about deprecated APIs were raised it would be gathered by Scorecard default tests and appended to its results.

Use Case

I am as operator author, I would like to be able to gather the warnings raised by the k8s API when my operator is running and tested by Scorecard on the cluster then, I would be able to be easily informed beforehand that my operator is using/doing requests to deprecated APIs

Describe the solution you'd like.

Implementation in the Scorecard checks to looking for the raised events/metrics in the K8S API to gathering its WARNINGS and return as a result of its tests. More info: https://kubernetes.io/blog/2020/09/03/warnings/#deprecation-warnings. E.g:

  1. Run Scorecard tests to trigger the reconcile by for example applying its CRs on the cluster
  2. Then, gathering the metrics raised by K8s API and append its result in the Scorecard results. (e.g We can check the warnings by looking at the events e.g kubectl get events --field-selector="reason=AppliedWithWarnings" --all-namespaces) :
image

OR

kubectl get --raw /metrics | prom2json | jq ' .[] | select(.name=="apiserver_requested_deprecated_apis").metrics[].labels '

  1. Appending the WARNINGs to the Scorecard results

Aditional Context:

OCP introduce Prometheus alerts using k8s metrics Two alerts have been introduced with OpenShift 4.8

  • APIRemovedInNextReleaseInUse - for APIs that will be removed in the next release.
  • APIRemovedInNextEUSReleaseInUse - for APIs that will be removed in the next EUS release.

More information on alerts, how to retrieve them or how to get notified is available in OpenShift documentation.

/language go /language ansible /language helm

camilamacedo86 avatar Aug 09 '21 17:08 camilamacedo86

It could be just me, but this “reporting the deprecated/removed APIs” sounds like a really handy scorecard test that is worth becoming a default test and benefits all Operators

tlwu2013 avatar Aug 09 '21 18:08 tlwu2013

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Nov 21 '21 21:11 openshift-bot

/lifecycle frozen

camilamacedo86 avatar Nov 24 '21 20:11 camilamacedo86

Following is aditional information that can help us to do that as a proposed solution.

Proposed solution

Develop a Scorecard custom test to identify the deprecated APIs using the k8s metrics and raise warnings based on the Operator Services (CRs) usages. This test ought to work dynamically and without the need to be updated and supplemented every time an API is flagged as deprecated.  

What value does it can bring? 

  • This solution would allow us to notify the Operator authors without spending effort for each new version/deprecation made in the k8s API. 
  • Provide a functional example with Scorecard. Currently, we have no Scorecard doing tests that effectively are functional or using the cluster. The Scorecard tests available are mainly liting the files used in the bundles such as we do with the static validators.  

TL'DR: Technical Details  

Is it enough just to install the operator?

It probably requires applying all CRs to ensure that the operator will hit the deprecated apis in 1.25, 1.26 for e.g and trigger the Kube metrics. 

How to use the Kube metrics/alerts?

We are able to run the following command. But that is not good because we are not removing the requests that were not done by the operator.

We can shape/develop a command such as follows to get the metrics but we need to find out how to try to exclude what was not done by the operator.

kubectl get --raw /metrics | prom2json | jq '
  # set $deprecated to a list of deprecated APIs
  [
    .[] | 
    select(.name=="apiserver_requested_deprecated_apis").metrics[].labels |
    {group,version,resource}
  ] as $deprecated 
  
  |
  
  # select apiserver_request_total metrics which are deprecated
  .[] | select(.name=="apiserver_request_total").metrics[] |
  select(.labels | {group,version,resource} as $key | $deprecated | index($key))
'

We might be able to:

  • combine these metrics (https://github.com/kubernetes/kubernetes/blob/d7545437267be8cd162e76b81c0cf4a47dd33208/test/instrumentation/testdata/stable-metrics-list.yaml#L241-L266) into a query to give you something specific to your operator (see the Kubernetes Deprecation Policy section on API resources, see: https://kubernetes.io/docs/reference/using-api/deprecation-policy/#rest-resources-aka-api-objects).
  • if we look at the queries used to define in the OCP alerts, we check that it uses the system_client label to exclude some operators/controllers, so we might be able to use it, see: https://github.com/openshift/cluster-kube-apiserver-operator/blob/release-4.10/bindata/assets/alerts/api-usage.yaml

Have we an example scenario? 

An operator that is using these removed APIs in 1.25 for we use to test is https://github.com/keycloak/keycloak-operator. See: https://github.com/keycloak/keycloak-operator/blob/996c21bca9f1d948e784d4f1ef5caaba5088944e/pkg/model/postgresql_aws_periodic_backup.go#L6 the deprecated/removed API will be called to reconcile the KeycloakBackup CR because it will create the batch resource on the cluster using k8s.io/api/batch/v1beta1

How to create a custom Scorecard test? 

https://sdk.operatorframework.io/docs/testing-operators/scorecard/custom-tests/

What this custom Scorecard would need to be able to do?

  • The test ought to identify all CR's available and apply them
  • The test should verify the k8s apis deprecated and verify the metrics in order to produce warnings

camilamacedo86 avatar Feb 14 '22 10:02 camilamacedo86