ratify icon indicating copy to clipboard operation
ratify copied to clipboard

Azure test cleanup runs before tests complete

Open akashsinghal opened this issue 1 year ago • 4 comments

What happened in your environment?

The Azure e2e tests fail on the main branch intermittently. The new test clean up stage seems to run concurrently to the actual test leading to resource group being prematurely deleted

What did you expect to happen?

No response

What version of Kubernetes are you running?

No response

What version of Ratify are you running?

No response

Anything else you would like to add?

No response

Are you willing to submit PRs to contribute to this bug fix?

  • [ ] Yes, I am willing to implement it.

akashsinghal avatar Aug 25 '23 18:08 akashsinghal

do you mean the test clean up stage caused the intermittent failure? I didn't see failures in the recent builds, guess you re-ran failed tests.

binbin-li avatar Aug 28 '23 04:08 binbin-li

@binbin-li It seems like the test cleanup stage might run in parallel to the actual test job. Or it doesn't wait for all jobs to complete. Take this run as an example: https://github.com/deislabs/ratify/actions/runs/5979145301

The errors point to the resource already being deleted. I'm also curious what about scenarios where multiple AKS test are running in parallel. The resource cleanup will clean up all resource groups prefixed.

akashsinghal avatar Aug 28 '23 05:08 akashsinghal

I'm suspecting that it was deleted by a previous action run on the main branch. image The delete job was just deleting resources tagged by ratifye2e, and the prev job looks just finished a few minutes ahead. As for the multiple AKS test running, the cleanup stage would delete resources across tests as you mentioned. This is a limitation to the current setup as it was designed to delete previous resources as well.

binbin-li avatar Aug 28 '23 05:08 binbin-li

I'm suspecting that it was deleted by a previous action run on the main branch. image The delete job was just deleting resources tagged by ratifye2e, and the prev job looks just finished a few minutes ahead. As for the multiple AKS test running, the cleanup stage would delete resources across tests as you mentioned. This is a limitation to the current setup as it was designed to delete previous resources as well.

Ahh yes good point. That's likely it. We'll probably run into this issue only for multiple AKS build-pr runs. In this case, I made the release branch right after the PR merged for chart updates leading to the conflict. I'm hoping this is not very common

akashsinghal avatar Aug 28 '23 21:08 akashsinghal