dns icon indicating copy to clipboard operation
dns copied to clipboard

k8s-dns e2e test suite failing with exit status 1 at HEAD

Open DamianSawicki opened this issue 1 year ago • 6 comments

pull-kubernetes-dns-test fails at HEAD (verified for the no-op PR https://github.com/kubernetes/dns/pull/645) as below:

...
2024/10/06 16:17:58 test | 2024/10/06 16:17:53 sidecar started
2024/10/06 16:17:58 test | 2024/10/06 16:17:53 running `dig`
2024/10/06 16:17:58 test | 2024/10/06 16:17:53 Waiting for hits to be reported to be greater than 100
2024/10/06 16:17:58 test | 
2024/10/06 16:17:58 All tests passed
2024/10/06 16:17:58 docker [rmi -f k8s-dns-sidecar-e2e-test]
Running Suite: k8s-dns e2e test suite
=====================================
Random Seed: 1728231478
Will run 5 of 5 specs
2024/10/06 16:18:20 exit status 1
Ginkgo ran 1 suite in 21.764852525s
Test Suite Failed

This (most probably) blocks a vulnerability-fix PR https://github.com/kubernetes/dns/pull/638 open since July for which tests are failing identically.

For the last merged PR https://github.com/kubernetes/dns/pull/635 the test pull-kubernetes-dns-test passed, so apparently the tests or test infra must have changed in the meantime. For https://github.com/kubernetes/dns/pull/638, the test failed identically on July 23rd, July 29th, and September 14th, so the issue seems to predate the August 2024 Prow migration.

DamianSawicki avatar Oct 06 '24 18:10 DamianSawicki

I think the failing test is defined in test/e2e/e2e_test.go in the present repo. This means it has not been modified since https://github.com/kubernetes/dns/pull/635, so it is more of an infra thing.

When I tried to run the test locally, I got the message 2024/10/06 21:08:39 e2e test requires `sudo` to be active. Run `sudo -v` before running the e2e test., so perhaps it is a matter of permissions?

Also, in artifacts of the failed run, in the file podinfo.json, I've found the following:

				{
					"name": "test",
					"state": {
						"terminated": {
							"exitCode": 1,
							"reason": "Error",
							"message": " test | \n2024/10/06 16:17:58 All tests passed\n2024/10/06 16:17:58 docker [rmi -f k8s-dns-sidecar-e2e-test]\nRunning Suite: k8s-dns e2e test suite\n=====================================\nRandom Seed: \u001b[1m1728231478\u001b[0m\nWill run \u001b[1m5\u001b[0m of \u001b[1m5\u001b[0m specs\n\n2024/10/06 16:18:20 exit status 1\n\nGinkgo ran 1 suite in 21.764852525s\nTest Suite Failed\n\n\u001b[38;5;228mGinkgo 2.0 is coming soon!\u001b[0m\n\u001b[38;5;228m==========================\u001b[0m\n\u001b[1m\u001b[38;5;10mGinkgo 2.0\u001b[0m is under active development and will introduce several new features, improvements, and a small handful of breaking changes.\nA release candidate for 2.0 is now available and 2.0 should GA in Fall 2021.  \u001b[1mPlease give the RC a try and send us feedback!\u001b[0m\n  - To learn more, view the migration guide at \u001b[38;5;14m\u001b[4mhttps://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md\u001b[0m\n  - For instructions on using the Release Candidate visit \u001b[38;5;14m\u001b[4mhttps://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md#using-the-beta\u001b[0m\n  - To comment, chime in at \u001b[38;5;14m\u001b[4mhttps://github.com/onsi/ginkgo/issues/711\u001b[0m\n\nTo \u001b[1m\u001b[38;5;204msilence this notice\u001b[0m, set the environment variable: \u001b[1mACK_GINKGO_RC=true\u001b[0m\nAlternatively you can: \u001b[1mtouch $HOME/.ack-ginkgo-rc\u001b[0m\n+ EXIT_VALUE=1\n+ set +o xtrace\nCleaning up after docker in docker.\n================================================================================\nWaiting 30 seconds for pods stopped with terminationGracePeriod:30\nCleaning up after docker\nWaiting for docker to stop for 30 seconds\nStopping Docker: dockerProgram process in pidfile '/var/run/docker-ssd.pid', 1 process(es), refused to die.\n================================================================================\nDone cleaning up after docker in docker.\n{\"component\":\"entrypoint\",\"error\":\"wrapped process failed: exit status 1\",\"file\":\"sigs.k8s.io/prow/pkg/entrypoint/run.go:84\",\"func\":\"sigs.k8s.io/prow/pkg/entrypoint.Options.internalRun\",\"level\":\"error\",\"msg\":\"Error executing test process\",\"severity\":\"error\",\"time\":\"2024-10-06T16:19:10Z\"}\n",
							"startedAt": "2024-10-06T15:55:53Z",
							"finishedAt": "2024-10-06T16:19:10Z",
							"containerID": "containerd://302c6068cdfb4c64dd8aafb8b56a4f61083e252a3c594e89249c2a568e443000"
						}
					},
					"lastState": {},
					"ready": false,
					"restartCount": 0,
					"image": "gcr.io/k8s-staging-test-infra/kubekins-e2e:v20240923-c8645c1a17-master",
					"imageID": "gcr.io/k8s-staging-test-infra/kubekins-e2e@sha256:c5cf57a29e78a568ecf90a3b5b4df6b2afd5245c97edda91759e3e07f2330ba7",
					"containerID": "containerd://302c6068cdfb4c64dd8aafb8b56a4f61083e252a3c594e89249c2a568e443000",
					"started": false
				}

which mentions kubekins-e2e, which seems to be deprecated.

DamianSawicki avatar Oct 06 '24 21:10 DamianSawicki

Hey @BenTheElder, I found you among the owners of kubekins-e2e mentioned above. Would you be able to look at the comments above and possibly share some advice?

DamianSawicki avatar Oct 08 '24 18:10 DamianSawicki

I don't work in this repo, but kubekins-e2e is an image we use currently to run some CI in the kubernetes project. It has a grab bag of tools like docker. Any other usage is best-effort.

podinfo.json is the pod in which we executed the PR tests. for more see https://docs.prow.k8s.io/docs/jobs/ and https://github.com/kubernetes/test-infra (config/)

BenTheElder avatar Oct 08 '24 18:10 BenTheElder

unless this project opted into it, the pod most likely ran as root, but it's hard to know without tracing the job specifics, e.g. you may have scheduled the test into the cluster under test (Which is NOT the cluster we use to run CI, that just executes the CI workloads, which then create disposable test clusters)

seems to predate the August 2024 Prow migration.

that migration was for the control plane. migrating the workloads was done prior to this, and varies by workload.

you can find this job's definition in the test-infra repo and see the git history there.

we're currently approach KEP Freeze, and I will be out for a few days after that, so time is tight this week 😅

BenTheElder avatar Oct 08 '24 18:10 BenTheElder

Ben, thank you very much for your responses!

@VikashLNU @zhangguanzhang You can have a look at the comments above to try to unblock the PR https://github.com/kubernetes/dns/pull/638 you're interested in.

DamianSawicki avatar Oct 09 '24 09:10 DamianSawicki

Ben, thank you very much for your responses!

@VikashLNU @zhangguanzhang You can have a look at the comments above to try to unblock the PR #638 you're interested in.

I don't see how to resolve the issue, but once someone fixes the CI build problem, I can rebase my code onto the master branch and push it.

zhangguanzhang avatar Oct 10 '24 00:10 zhangguanzhang

We should be good to close this issue now. https://github.com/kubernetes/dns/pull/651 Addressed it.

dereknola avatar Nov 13 '24 17:11 dereknola

Yeah, thank you very much again, @dereknola!

DamianSawicki avatar Nov 13 '24 19:11 DamianSawicki