deployments-k8s icon indicating copy to clipboard operation
deployments-k8s copied to clipboard

Fix k8s monolith external nsc test instability.

Open VitalyGushin opened this issue 11 months ago • 2 comments

https://github.com/networkservicemesh/deployments-k8s/issues/11229

The issue is that we start applying the testcase immediately after the NSM is applied, without waiting for the NSM to be ready. The registry service exposing check is successful immediately after applying NSM customization, when the NSM pods have not yet been launched.

time=2024-02-21T10:00:49Z level=info msg=kubectl apply -k https://github.com/networkservicemesh/deployments-k8s/examples/k8s_monolith/configuration/cluster?ref=e256ac43309ae0e4fc5605b7dce1f26e2b93bc63 TestK8sMonolithSuite/External_nsc=stdin
time=2024-02-21T10:00:51Z level=info msg=namespace/nsm-system created
customresourcedefinition.apiextensions.k8s.io/networkserviceendpoints.networkservicemesh.io unchanged
customresourcedefinition.apiextensions.k8s.io/networkservices.networkservicemesh.io unchanged
serviceaccount/admission-webhook-sa created
serviceaccount/nsmgr-proxy-sa created
serviceaccount/nsmgr-sa created
serviceaccount/registry-k8s-sa created
clusterrole.rbac.authorization.k8s.io/admission-webhook-role unchanged
clusterrole.rbac.authorization.k8s.io/nsmgr-binding-role unchanged
clusterrole.rbac.authorization.k8s.io/nsmgr-proxy-binding-role created
clusterrole.rbac.authorization.k8s.io/registry-k8s-role unchanged
clusterrolebinding.rbac.authorization.k8s.io/admission-webhook-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/nsmgr-binding unchanged
clusterrolebinding.rbac.authorization.k8s.io/nsmgr-proxy-binding created
clusterrolebinding.rbac.authorization.k8s.io/registry-k8s-role-binding unchanged
service/admission-webhook-svc created
service/nsmgr-proxy created
service/registry created
deployment.apps/admission-webhook-k8s created
deployment.apps/nsmgr-proxy created
deployment.apps/registry-k8s created
daemonset.apps/forwarder-vpp created
daemonset.apps/nsmgr created
mutatingwebhookconfiguration.admissionregistration.k8s.io/nsm-mutating-webhook created TestK8sMonolithSuite/External_nsc=stdout
time=2024-02-21T10:00:51Z level=info msg=# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically. TestK8sMonolithSuite/External_nsc=stderr
time=2024-02-21T10:00:51Z level=info msg=kubectl get services registry -n nsm-system -o go-template='{{index (index (index (index .status "loadBalancer") "ingress") 0) "ip"}}' TestK8sMonolithSuite/External_nsc=stdin
time=2024-02-21T10:00:51Z level=info msg=172.18.1.130 TestK8sMonolithSuite/External_nsc=stdout
=== RUN   TestK8sMonolithSuite/External_nsc/TestKernel2IP2Kernel
time=2024-02-21T10:00:51Z level=info msg=kubectl apply -k https://github.com/networkservicemesh/deployments-k8s/examples/k8s_monolith/external_nsc/usecases/Kernel2IP2Kernel?ref=e256ac43309ae0e4fc5605b7dce1f26e2b93bc63 TestK8sMonolithSuite/External_nsc/TestKernel2IP2Kernel=stdin
time=2024-02-21T10:00:52Z level=info msg=namespace/ns-kernel2ip2kernel-monolith-nsc created
networkservice.networkservicemesh.io/kernel2ip2kernel-monolith-nsc created TestK8sMonolithSuite/External_nsc/TestKernel2IP2Kernel=stdout
time=2024-02-21T10:00:52Z level=info msg=# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
Error from server (InternalError): error when creating "https://github.com/networkservicemesh/deployments-k8s/examples/k8s_monolith/external_nsc/usecases/Kernel2IP2Kernel?ref=e256ac43309ae0e4fc5605b7dce1f26e2b93bc63": Internal error occurred: failed calling webhook "nsm-mutating-webhook.networkservicemesh.io": failed to call webhook: Post "[https://admission-webhook-svc.nsm-system.svc:443/mutate?timeout=10s](https://admission-webhook-svc.nsm-system.svc/mutate?timeout=10s)": dial tcp 10.96.160.219:443: connect: connection refused TestK8sMonolithSuite/External_nsc/TestKernel2IP2Kernel=stderr

VitalyGushin avatar Feb 29 '24 14:02 VitalyGushin

I'm not sure if this is the problem. You are right that sometimes we see errors. But the integration-tests are designed in such a way that if we get an error code !=0, then we try again and again. Until the timeout expires (1m by default) or success. Here we can see that we got success in the end. But then the test failed. https://github.com/networkservicemesh/integration-k8s-kind/actions/runs/7986677284/job/21812339675?pr=976#step:9:7018

glazychev-art avatar Mar 01 '24 08:03 glazychev-art

Anyway, this is a flaw in the test scenario that needs to be fixed. In addition, after this error, a ping error immediately occurs, so they may be related. I suggest merging this fix and seeing if the test result changes.

VitalyGushin avatar Mar 05 '24 06:03 VitalyGushin