kuttl Suite fails when an `errors` file references unknown resource type

trafficstars

What happened:

One of my steps has a file 10-errors.yaml which contains (among other things):

apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: foo
---
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: bar

This runs fine on OpenShift. These objects are missing as expected and test passes.

However on non-OpenShift cluster I see:

    case.go:361: failed in step 10-my-step
    case.go:363: retrieving API resource for security.openshift.io/v1, Kind=SecurityContextConstraints failed: the server could not find the requested resource
    case.go:363: retrieving API resource for security.openshift.io/v1, Kind=SecurityContextConstraints failed: the server could not find the requested resource

And the test fails.

What you expected to happen:

The test should pass, since the absence of the given Kind on the API server guarantees the absence of objects of that kind (which is what the errors file is trying to assert).

How to reproduce it (as minimally and precisely as possible):

Drop the above file into any test on non-openshift cluster. Or in fact a definition of any object whose Kind is not known, on any cluster.

Anything else we need to know?:

Obviously a server could not find the requested resource should still be fatal when processing an assert file.

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:15:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

KUTTL version (use kubectl kuttl version): KUTTL Version: version.Info{GitVersion:"0.11.0", GitCommit:"1a31524", BuildDate:"2021-07-14T18:52:00Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Cloud provider or hardware configuration: docker for mac
OS (e.g. from /etc/os-release): MacOS 11.5.2
Kernel (e.g. uname -a): Darwin Marcins-MacBook-Pro 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 x86_64

Aug 27 '21 06:08 porridge

Hey @porridge ! Great to hear from you and great conversation point. I could see a missing CRD as a test setup issue.. but I could see your point as well. If we are looking for the absence of something... it seems most flexible to implement your proposal here with a log for difference. Looking for feedback from others who might see things differently.. but I'm a fan of implementing this feature to provide greater portability.

Sep 02 '21 19:09 kensipe

Assigning to next release... still open to feedback and concerns

Sep 02 '21 19:09 kensipe

IMO non-existing and unknown type are two different things that should not be mixed. To workaround this problem, you can install a "fake" Openshift CRD in your non-Openshift cluster. We did that when running tests in k3s before removing our use of Openshift-specific resource types.

Sep 02 '21 19:09 erikgb

that was my initial thinking as well... I was resistant to conflate the 2 concepts... but it seems reasonable to say if the test asserts that a resource doesn't exist... that if the resource can't exist, then it doesn't. It is good to get more thoughts around this. Perhaps a strictness flag on the errors struct.

Sep 02 '21 21:09 kensipe

@erikgb thanks for your thoughts, but I beg to disagree :-)

non-existing and unknown type are two different things

This is true.

that should not be mixed

But I disagree here. The former is implied by the latter. The are not orthogonal.

To workaround this problem, you can install a "fake" Openshift CRD in your non-Openshift cluster

The problem with this approach is that I've seen some places where code tries to auto-detect whether it's running against an OpenShift cluster by looking for the availability of some openshift-specific resources. I could see things go very badly from there :-/

Also, since CRDs are cluster scoped this introduces two additional problems:

they are no longer cleaned up by kuttl automatically at the end of the test (correct me if I'm wrong, but I think the cleanup is limited to the ephemeral namespaces created by kuttl)
they typically require cluster-admin privileges, which in general is not something that one can expect from every kuttl user

Sep 03 '21 04:09 porridge

The problem with this approach is that I've seen some places where code tries to auto-detect whether it's running against an OpenShift cluster by looking for the availability of some openshift-specific resources. I could see things go very badly from there :-/

Then I would like to know what is the use case here? I thought you were running a simple throwaway cluster to test your software, and the CRD hack is a simple one. 😉

If you really have to test code built for a particular K8s distribution like Openshift, I would then suggest to separate your Openshift tests into a separate suite. We do that, when testing our App-operator against Openshift, to verify that the Openshift Ingress controller creates the correct Openshift Routes from the Ingress.

tests/
├── e2e
│   ├── default-app
│   │   └── resources
│   ├── env-app
│   ├── max-app
│   │   ├── expected
│   │   └── resources
│   ├── min-app
│   ├── scale-app
│   ├── webhook-defaulting
│   ├── webhook-validating
│   │   ├── expected
│   │   └── resources
│   └── zero-app
│       ├── expected
│       └── resources
└── openshift
    └── ingress-route

What I would like to avoid, is a situation where a test passes when it should have failed. And I think your suggestion is a such. Let me take an example:

We are now in the process of preparing our software for K8s 1.22 - where a lot of things have been removed. We might have the following error-assert as part of our kuttl tests (min-app should NOT exist):

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: min-app

Ingress v1beta1 is one of the types that is removed in K8s 1.22, so someone picks up the task to remove the dependency to the obsolete version. I do not want my test to pass if min-app Ingress v1 exists after the change.....

IMHO I think we should build tools that encourages people to do things right. And treating non-existing and unknown type the same, even with an option, should be avoided. And while Openshift is a great K8s distribution, we should not urge people to write software for Openshift.

Sep 03 '21 05:09 erikgb

I think what the suggestion ultimately boils down to is to allow parameterized kuttl test suites. An option to allow suppressing a check just for the specific case of resource unavailability is odd. But a general mechanism for conditional assertions and parameterized suites, not so much.

@erikgb in the example you give regarding ingress, what would be a solution to write a kuttl test suite that works to assert the absence of an Ingress both for Kubernetes versions with only either of v1beta1 or v1? After all, OpenShift 3.11 is still very much supported, so there is certainly at least a commercial need for working within such an environment.

Of course, one could have multiple test suites, each being isolated to only a single feature (ingress or not etc.). But Kuttl neither enforces nor encourages writing test suites like this - for good reasons, I assume - and therefore I believe that a real feature gap remains.

Nov 15 '21 13:11 misberner

I do not mind having support for Openshift, as we also runs our stuff on Openshift. What I would not like to see, is kuttl threating "I do not know the resource type" and "The resource does not exist" equally. That would be wrong IMO.

Nov 15 '21 14:11 erikgb

kuttl kuttl copied to clipboard

Suite fails when an `errors` file references unknown resource type

kuttl
kuttl copied to clipboard