serving-operator
serving-operator copied to clipboard
Validation Script
In what area(s)?
/area API /area autoscale /area build /area monitoring /area networking /area test-and-release
Describe the feature
It would be helpful to have a validation script that took in a kubeconfig and gave a pass/fail or likelihood of passing if knative could be run on the cluster. The script would be run by a cluster admin and validate that the kubernetes cluster is properly setup to support knative.
It would be run like: validate-cluster.sh --kubeconfig=/path/to/kubeconfig
I'm not immediately sure what all the checks would be, one initial suggestion by @dgerd is the initial check would simply check the k8s version and perhaps a version of a supported ingress/gateways services.
I would like to work on this item.
/assign @yu2003w
@houshengbo: GitHub didn't allow me to assign the following users: yu2003w.
Note that only knative members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
/assign @yu2003w
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@yu2003w Are u a member of Knative? Why cannot I assign it to you?
@houshengbo I'm not member of knative.
@yu2003w You definitely need to join the knative community as a member. However, you can start working on it anyhow
@k4leung4 @greghaynes Since we folks have gone the process of proposing knative member before, could you propose @yu2003w as a member of knative? I can be another member proposing @yu2003w if needed.
validation scripts check following items: 1, k8s cluster version 2, Istio version and following services deployed cluster-local-gateway istio-ingressgateway 3, cert manager installed
Thanks for working on this. What do we think about rather than a shell script we make this go code which our operator can run? I expect we will want our operator to perform checks like this as part of normal operation and it will be difficult to bake in a shell script which it runs.
Operator should perform the checks before installing Knative Serving. Does operator need to check the conditions such as ingress gateway periodically after knative serving installed? Does operator support to install serving with Gloo preinstalled rather than istio?
Operator should perform the checks before installing Knative Serving. Yes. I am open to the idea of updating the
KnativeServing
CRD to add the desired dependency configuration. @houshengbo @greghaynes and @trshafer jcrossley3
Something like this?
spec:
deps:
istio: true
gloo: false
mtls: false
Does operator need to check the conditions such as ingress gateway periodically after knative serving installed? I don't think it's necessary for the operator to be continually checking conditions for ingress gateway. Actually the more I think about this, I think it might make sense for the operator to be in a
caution
state if the ingress gateway goes away. One thing I'm thinking of is upgrading a cluster from one version to another. Upgrading the operator should upgrade the knative config and report any dependency issues. So if we're upgrading in a cluster where the operator is in a "caution" state, that should probably still update knative-serving.
I'm also thinking if upgrading the operator means knative has a backward incompatible change with a prior version of istio. It would not want to squash an existing istio configuration, but also it should fail to install.
Does operator support to install serving with Gloo preinstalled rather than istio? I think it's still undecided if the operator should install the dependencies or just report the dependencies are uninstalled. We definitely don't want the operator to override an existing istio/gloo installation.
@greghaynes Per your comments, instead of running scripts to validate. We can use the kind job
to run commands for the validations, like the following pods in istio:
istio-cleanup-secrets-1.2.0-wblhw 0/1 Completed 0 2m43s
istio-init-crd-10-qcfkp 0/1 Completed 0 2m55s
istio-init-crd-11-c9sps 0/1 Completed 0 2m54s
istio-init-crd-12-v7v8z 0/1 Completed 0 2m54s
istio-security-post-install-1.2.0-tc2p5 0/1 Completed 0 2m42s
We can have a pod named "knative-serving-operator-dependency-validation"(or multiple pods if necessary) running the validation one by one. Completed
means the validation has passed. If it fails, we can either try to fix it, or log the error message. Any thoughts on that?
@trshafer @yu2003w We can list the desired dependencies, and specify whether to validate them in a ConfigMap, for example? The pod "knative-serving-operator-dependency-validation" can access this ConfigMap and run the validate one by one.
This is currently some of my consideration. Welcome your comments.
Do we need to run all validation tasks in one job and check error message to determine which failed? Or is validation divided into several jobs?
What value do we get from the additional complexity of a separate job and possibly a configmap when we already have an operator and a CRD? I think we agreed in the recent WG meeting that the best way to communicate the status of the installation (including any missing deps) is through the status field of the custom resource. And I think the best way to do that is by having the operator invoke any necessary validation logic.
Thanks for the comments. I will try to update CRD and send PR later this week.
A few comments on script vs-CRD:
- Being able to run these on an ongoing basis would be great. This is probably pro-CRD.
- There are a set of getting-started problems that may be difficult to diagnose which happen before the CRD and operator are running on the cluster. Some examples (these are mostly fixed, but there may be others):
- Calling role does not have sufficient RBAC (this is a sample from GKE that I'm aware of).
- Certain Kubernetes flags or addons are disabled in the cluster (e.g. mutating admission control)
- It may be desirable to capture the output of the tool for later analysis.
apiserver
CRD storage might be good for this, though a text file is probably also fine. - Whatever is running the system checks may need a fairly broad RBAC role (though hopefully this would mostly be "read" capabilities). This may run into some of the same problems as Tiller. (Though, see point 1 about running on an ongoing basis.)
For the dependencies, shall we also check the version? For example, for istio, except validating ingress/cluster-local gateway working as expected, shall we check wether Istio-version is meet the requirement? Maybe version is in a range of 1.1.3,1.1.7, 1.2.0.
I think so, it makes sense for knative to support a range of istio installations. Probably makes sense to be a wildcard on patch
. Currently the installation guide supports one version, 1.1.7: https://knative.dev/docs/install/installing-istio/#downloading-istio-and-installing-crds
@trshafer what's your meaning of patch
?
The third number in a release: Major.Minor.Patch
.
@trshafer How could istio/certmanager version be retrieved? From image tag or other? I need some advice. For istio, version could be get from at least two points: 1, "productVersion" in annonation
Annotations: productID: istio-galley_1.1.7_apachev2_00000
productName: istio-galley
productVersion: 1.1.7
sidecar.istio.io/inject: false
2, image tag
now istio images are tagged with version. {docker repo}/istio-galley:1.1.7
For certmanager, it seems that image tag is the only approach to get version.
quay.io/jetstack/cert-manager-controller:v0.6.1
Which do you prefer? Or is there any other conventional method?
For patch
, could we use a smallest number rather than wildcard
?
Take istio as an example, now istio 1.1.2, 1.1.3, 1.1.7 and 1.2.0 could be installed, if wildcard used and user installed istio 1.1.0, it will pass validation check. However, maybe istio 1.1.0 couldn't be supported.
If using istio 1.1.2 as the baseline version, versions 1.1.2 and later could pass validation check.
Do you think this make sense?
I don't think it can reliably be used from the image tag. Similarly how users can change the format of the image digest in the operator using the KnativeServing
CRD, users may change the istio image digest.
Does this work for getting the istio version? It grabs the istio version from the environment variable in the proxy container?
ISTIO_PILOT_POD=$(kubectl get pod --selector=istio=pilot -n istio-system -o jsonpath='{$.items[0].metadata.name}') kubectl exec -it $ISTIO_PILOT_POD -n istio-system --container istio-proxy env | grep VERSION
@trshafer Thanks a lot. I think we should rely on ISTIO_META_ISTIO_VERSION=1.1.7
to get Istio version.
@trshafer I have updated code and got Istio version from Istio-pilot pod. Currently version of cert-manager should be the image tag. Developer of cert-manager confirmed this.
@trshafer
For istio 1.2.4, in isitio-pilot pod, env ISTIO_META_ISTIO_VERSION
is not set the right value.
root@istio-pilot-5fc44889bf-8bm5v:/# env | grep VERSION
ISTIO_META_ISTIO_VERSION=1.0-dev
ISTIO_META_ISTIO_PROXY_VERSION=1.1.3
I'm afraid we couldn't depend on env to get istio version. Maybe we need to get it from image tag. Any thoughts about this?