serving-operator icon indicating copy to clipboard operation
serving-operator copied to clipboard

Validation Script

Open trshafer opened this issue 5 years ago • 24 comments

In what area(s)?

/area API /area autoscale /area build /area monitoring /area networking /area test-and-release

Describe the feature

It would be helpful to have a validation script that took in a kubeconfig and gave a pass/fail or likelihood of passing if knative could be run on the cluster. The script would be run by a cluster admin and validate that the kubernetes cluster is properly setup to support knative.

It would be run like: validate-cluster.sh --kubeconfig=/path/to/kubeconfig

I'm not immediately sure what all the checks would be, one initial suggestion by @dgerd is the initial check would simply check the k8s version and perhaps a version of a supported ingress/gateways services.

trshafer avatar Apr 12 '19 18:04 trshafer

I would like to work on this item.

yu2003w avatar Jun 21 '19 08:06 yu2003w

/assign @yu2003w

houshengbo avatar Jun 21 '19 20:06 houshengbo

@houshengbo: GitHub didn't allow me to assign the following users: yu2003w.

Note that only knative members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/assign @yu2003w

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot avatar Jun 21 '19 20:06 knative-prow-robot

@yu2003w Are u a member of Knative? Why cannot I assign it to you?

houshengbo avatar Jun 21 '19 20:06 houshengbo

@houshengbo I'm not member of knative.

yu2003w avatar Jun 22 '19 04:06 yu2003w

@yu2003w You definitely need to join the knative community as a member. However, you can start working on it anyhow

@k4leung4 @greghaynes Since we folks have gone the process of proposing knative member before, could you propose @yu2003w as a member of knative? I can be another member proposing @yu2003w if needed.

houshengbo avatar Jun 24 '19 22:06 houshengbo

validation scripts check following items: 1, k8s cluster version 2, Istio version and following services deployed cluster-local-gateway istio-ingressgateway 3, cert manager installed

yu2003w avatar Jun 26 '19 09:06 yu2003w

Thanks for working on this. What do we think about rather than a shell script we make this go code which our operator can run? I expect we will want our operator to perform checks like this as part of normal operation and it will be difficult to bake in a shell script which it runs.

greghaynes avatar Jun 26 '19 18:06 greghaynes

Operator should perform the checks before installing Knative Serving. Does operator need to check the conditions such as ingress gateway periodically after knative serving installed? Does operator support to install serving with Gloo preinstalled rather than istio?

yu2003w avatar Jun 27 '19 02:06 yu2003w

Operator should perform the checks before installing Knative Serving. Yes. I am open to the idea of updating the KnativeServing CRD to add the desired dependency configuration. @houshengbo @greghaynes and @trshafer jcrossley3

Something like this?

spec:
  deps:
    istio: true
    gloo: false
    mtls: false

Does operator need to check the conditions such as ingress gateway periodically after knative serving installed? I don't think it's necessary for the operator to be continually checking conditions for ingress gateway. Actually the more I think about this, I think it might make sense for the operator to be in a caution state if the ingress gateway goes away. One thing I'm thinking of is upgrading a cluster from one version to another. Upgrading the operator should upgrade the knative config and report any dependency issues. So if we're upgrading in a cluster where the operator is in a "caution" state, that should probably still update knative-serving.

I'm also thinking if upgrading the operator means knative has a backward incompatible change with a prior version of istio. It would not want to squash an existing istio configuration, but also it should fail to install.

Does operator support to install serving with Gloo preinstalled rather than istio? I think it's still undecided if the operator should install the dependencies or just report the dependencies are uninstalled. We definitely don't want the operator to override an existing istio/gloo installation.

trshafer avatar Jun 27 '19 15:06 trshafer

@greghaynes Per your comments, instead of running scripts to validate. We can use the kind job to run commands for the validations, like the following pods in istio:

istio-cleanup-secrets-1.2.0-wblhw         0/1     Completed   0          2m43s
istio-init-crd-10-qcfkp                   0/1     Completed   0          2m55s
istio-init-crd-11-c9sps                   0/1     Completed   0          2m54s
istio-init-crd-12-v7v8z                   0/1     Completed   0          2m54s
istio-security-post-install-1.2.0-tc2p5   0/1     Completed   0          2m42s

We can have a pod named "knative-serving-operator-dependency-validation"(or multiple pods if necessary) running the validation one by one. Completed means the validation has passed. If it fails, we can either try to fix it, or log the error message. Any thoughts on that?

@trshafer @yu2003w We can list the desired dependencies, and specify whether to validate them in a ConfigMap, for example? The pod "knative-serving-operator-dependency-validation" can access this ConfigMap and run the validate one by one.

This is currently some of my consideration. Welcome your comments.

houshengbo avatar Jun 28 '19 19:06 houshengbo

Do we need to run all validation tasks in one job and check error message to determine which failed? Or is validation divided into several jobs?

yu2003w avatar Jul 03 '19 14:07 yu2003w

What value do we get from the additional complexity of a separate job and possibly a configmap when we already have an operator and a CRD? I think we agreed in the recent WG meeting that the best way to communicate the status of the installation (including any missing deps) is through the status field of the custom resource. And I think the best way to do that is by having the operator invoke any necessary validation logic.

jcrossley3 avatar Jul 03 '19 15:07 jcrossley3

Thanks for the comments. I will try to update CRD and send PR later this week.

yu2003w avatar Jul 08 '19 16:07 yu2003w

A few comments on script vs-CRD:

  • Being able to run these on an ongoing basis would be great. This is probably pro-CRD.
  • There are a set of getting-started problems that may be difficult to diagnose which happen before the CRD and operator are running on the cluster. Some examples (these are mostly fixed, but there may be others):
  • It may be desirable to capture the output of the tool for later analysis. apiserver CRD storage might be good for this, though a text file is probably also fine.
  • Whatever is running the system checks may need a fairly broad RBAC role (though hopefully this would mostly be "read" capabilities). This may run into some of the same problems as Tiller. (Though, see point 1 about running on an ongoing basis.)

evankanderson avatar Jul 08 '19 18:07 evankanderson

For the dependencies, shall we also check the version? For example, for istio, except validating ingress/cluster-local gateway working as expected, shall we check wether Istio-version is meet the requirement? Maybe version is in a range of 1.1.3,1.1.7, 1.2.0.

yu2003w avatar Jul 18 '19 11:07 yu2003w

I think so, it makes sense for knative to support a range of istio installations. Probably makes sense to be a wildcard on patch. Currently the installation guide supports one version, 1.1.7: https://knative.dev/docs/install/installing-istio/#downloading-istio-and-installing-crds

trshafer avatar Jul 18 '19 18:07 trshafer

@trshafer what's your meaning of patch?

yu2003w avatar Jul 19 '19 08:07 yu2003w

The third number in a release: Major.Minor.Patch.

trshafer avatar Jul 19 '19 16:07 trshafer

@trshafer How could istio/certmanager version be retrieved? From image tag or other? I need some advice. For istio, version could be get from at least two points: 1, "productVersion" in annonation

Annotations:        productID: istio-galley_1.1.7_apachev2_00000
                    productName: istio-galley
                    productVersion: 1.1.7
                    sidecar.istio.io/inject: false

2, image tag now istio images are tagged with version. {docker repo}/istio-galley:1.1.7

For certmanager, it seems that image tag is the only approach to get version.

quay.io/jetstack/cert-manager-controller:v0.6.1

Which do you prefer? Or is there any other conventional method?

For patch, could we use a smallest number rather than wildcard? Take istio as an example, now istio 1.1.2, 1.1.3, 1.1.7 and 1.2.0 could be installed, if wildcard used and user installed istio 1.1.0, it will pass validation check. However, maybe istio 1.1.0 couldn't be supported. If using istio 1.1.2 as the baseline version, versions 1.1.2 and later could pass validation check. Do you think this make sense?

yu2003w avatar Jul 23 '19 07:07 yu2003w

I don't think it can reliably be used from the image tag. Similarly how users can change the format of the image digest in the operator using the KnativeServing CRD, users may change the istio image digest.

Does this work for getting the istio version? It grabs the istio version from the environment variable in the proxy container?

ISTIO_PILOT_POD=$(kubectl get pod --selector=istio=pilot -n istio-system -o jsonpath='{$.items[0].metadata.name}') kubectl exec -it $ISTIO_PILOT_POD -n istio-system --container istio-proxy env | grep VERSION

trshafer avatar Jul 29 '19 20:07 trshafer

@trshafer Thanks a lot. I think we should rely on ISTIO_META_ISTIO_VERSION=1.1.7 to get Istio version.

yu2003w avatar Aug 07 '19 10:08 yu2003w

@trshafer I have updated code and got Istio version from Istio-pilot pod. Currently version of cert-manager should be the image tag. Developer of cert-manager confirmed this.

yu2003w avatar Aug 09 '19 16:08 yu2003w

@trshafer For istio 1.2.4, in isitio-pilot pod, env ISTIO_META_ISTIO_VERSION is not set the right value.

root@istio-pilot-5fc44889bf-8bm5v:/# env | grep VERSION
ISTIO_META_ISTIO_VERSION=1.0-dev
ISTIO_META_ISTIO_PROXY_VERSION=1.1.3

I'm afraid we couldn't depend on env to get istio version. Maybe we need to get it from image tag. Any thoughts about this?

yu2003w avatar Sep 17 '19 14:09 yu2003w