e2e-framework icon indicating copy to clipboard operation
e2e-framework copied to clipboard

After setup error, do klog.Error not klog.Fatal, and move on

Open codegold79 opened this issue 1 year ago • 7 comments
trafficstars

When an error happens during a setup step, instead of calling klog.Fatal which causes os.Exit, instead do klog.Error, break, run defer function with clean up steps, and return.

What type of PR is this?

/kind bug

What this PR does / why we need it:

I found this line to be the problem as to why our KinD cluster deletion was not happening as intended. As soon as an error value is passed during a setup step, klog.Fatalf(...) is called:

for _, setup := range setups {
    // context passed down to each setup
    if e.ctx, err = setup.run(e.ctx, e.cfg); err != nil {
        klog.Fatalf("%s failure: %s", setup.role, err)
   }
}

According to klog docs,

// Fatal logs to the FATAL, ERROR, WARNING, and INFO logs, // prints stack trace(s), then calls OsExit(255).

To reiterate, klog.Fatal is stopping all operations and exiting program with os.Exit. This is not the behavior I want. What I want to happen is for the setup range loop to stop trying to do the setup steps, and try to get to the function return, return m.Run(). However, before returning, I expect the defer block to run next where the finishing steps, like cluster cleanup, happen.

Which issue(s) this PR fixes:

Fixes #188

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Gracefully exit when setup fails

Additional documentation e.g., Usage docs, etc.:


codegold79 avatar Dec 08 '23 20:12 codegold79

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: codegold79 / name: frankly_coding (70b0316dcdbe482ec16e81615df3491da0e0e11b)

Welcome @codegold79!

It looks like this is your first PR to kubernetes-sigs/e2e-framework 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/e2e-framework has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Dec 08 '23 20:12 k8s-ci-robot

Hi @codegold79. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 08 '23 20:12 k8s-ci-robot

I'm thinking about how to write a test that ensures finish/clean up steps happen after setup step fails. This PR will be a WIP until I can get some tests up.

codegold79 avatar Dec 08 '23 20:12 codegold79

@codegold79 apologies for the delay. let me know if this ready for another round of review.

vladimirvivien avatar Feb 07 '24 21:02 vladimirvivien

Thanks, @vladimirvivien! I responded to your first review comment and tried to explain why I had to make a separate example and use TestMain.

As for your second review comment, I made the change as you suggested. I also got rid of the dummy test, as it doesn't help anything except to not trigger a warning that there are no tests.

The thing is, I'm not testing any features, I'm making sure the final actions complete even after a panic in the setup, before any feature tests happen.

PR is ready for review again.

codegold79 avatar Feb 27 '24 02:02 codegold79

@codegold79 (apologies for the late reply) I did spend sometimes to see if this can be tested using a normal test function and I think you are doing the right thing.

/lgtm

vladimirvivien avatar Apr 08 '24 15:04 vladimirvivien

Hi @codegold79 can you please squash commits. Thanks.

vladimirvivien avatar Apr 08 '24 15:04 vladimirvivien

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: codegold79, cpanato

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Apr 18 '24 17:04 k8s-ci-robot