volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Improve e2e tests to collect context informations

Open hajnalmt opened this issue 4 months ago • 8 comments

What is the problem you're trying to solve

Currently if a test fails, we don't have any informations about the failure. Why it happened, and how. We just see error messages in the ginkgo output.

This makes debugging flaky tests like the JobSeq really hard: https://github.com/volcano-sh/volcano/issues/4732

The only output we have is a kind export if the test failed: kind export in generate-log function

Which are container logs after the suite has finished.

Describe the solution you'd like

It would be good if would be able to dump the test-context in the artifacts path. The directory name could be the namespace's name.

By Test Context I understand:

  • pods
  • podgroups
  • queues
  • priorityClasses
  • VCJobs
  • VCronjobs
  • Standard kubernetes jobs/cronjobs/deployments/statefulsets

It can be a separate TestContext function used in a JustAfterEach - CurrentSpecReport().Failed() block. Example from Ginkgo's documentation: https://onsi.github.io/ginkgo/#separating-diagnostics-collection-and-teardown-justaftereach

hajnalmt avatar Nov 28 '25 12:11 hajnalmt

/good-first-issue /area test

hajnalmt avatar Nov 28 '25 12:11 hajnalmt

@hajnalmt: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue /area test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

volcano-sh-bot avatar Nov 28 '25 12:11 volcano-sh-bot

/assign

neeraj542 avatar Nov 29 '25 23:11 neeraj542

hi @hajnalmt

I'm planning to solve this issue #4764

My approach would be, like:

  1. Add a DumpTestContext() function in test/e2e/util/util.go to collect and dump Kubernetes resources (Pods, PodGroups, Queues, VCJobs, VCronJobs, K8s Jobs/CronJobs/Deployments/StatefulSets) for a namespace.
  2. Using Ginkgo's JustAfterEach with CurrentSpecReport().Failed() to trigger dumping only on test failures.
  3. And Save YAML files to ARTIFACTS_PATH/{namespace}/ for easy inspection.

it will help us to capture the cluster state when tests fail, making debugging easier, let me know if I can implement this

neeraj542 avatar Nov 29 '25 23:11 neeraj542

I'm testing this locally with existing codebase, it's running my local:

Image

hi @hajnalmt

I'm planning to solve this issue #4764

My approach would be, like:

  1. Add a DumpTestContext() function in test/e2e/util/util.go to collect and dump Kubernetes resources (Pods, PodGroups, Queues, VCJobs, VCronJobs, K8s Jobs/CronJobs/Deployments/StatefulSets) for a namespace.
  2. Using Ginkgo's JustAfterEach with CurrentSpecReport().Failed() to trigger dumping only on test failures.
  3. And Save YAML files to ARTIFACTS_PATH/{namespace}/ for easy inspection.

it will help us to capture the cluster state when tests fail, making debugging easier, let me know if I can implement this

neeraj542 avatar Nov 29 '25 23:11 neeraj542

Superb! The approach looks good. Thank you for picking this up.

hajnalmt avatar Nov 30 '25 08:11 hajnalmt

Superb! The approach looks good. Thank you for picking this up.

sure, I'll implement this, any suggestion contributing first time for volcano

neeraj542 avatar Nov 30 '25 13:11 neeraj542

hi @hajnalmt , i've raised PR, would be great if you can review #4767

neeraj542 avatar Dec 04 '25 16:12 neeraj542