ratify
ratify copied to clipboard
ci: update azure SP federated credentials
Description
What this PR does / why we need it:
This PR migrates the Azure SPN used for AKS e2e tests to use federated identities via OIDC
- It removes dependence on SPN secret
- It adds a new
aks-deploy
environment for jobs that require azure auth - It force bumps the az cli version so that it is >=2.60.0 (Note: this can be removed after the default github runner image uses a newer version of the az cli https://github.com/actions/runner-images?tab=readme-ov-file#available-images)
- It caches access tokens for registry scope and key vault scope so that azure resource operations will not fail auth after 5 minutes.
Once PR merges to dev:
- delete ALL secrets in SP
- each of the maintainers can choose to create a federated credential to their own fork under environment
aks-deploy
Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when the PR gets merged):
Fixes #1435
Type of change
Please delete options that are not relevant.
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Helm Chart Change (any edit/addition/update that is necessary for changes merged to the
main
branch) - [ ] This change requires a documentation update
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration
- [ ] Test A
- [ ] Test B
Checklist:
- [ ] Does the affected code have corresponding tests?
- [ ] Are the changes documented, not just with inline documentation, but also with conceptual documentation such as an overview of a new feature, or task-based documentation like a tutorial? Consider if this change should be announced on your project blog.
- [ ] Does this introduce breaking changes that would require an announcement or bumping the major version?
- [ ] Do all new files have appropriate license header?
Post Merge Requirements
- [ ] MAINTAINERS: manually trigger the "Publish Package" workflow after merging any PR that indicates
Helm Chart Change
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 68.16%. Comparing base (
1bd347c
) to head (5f9861c
). Report is 14 commits behind head on dev.
Additional details and impacted files
@@ Coverage Diff @@
## dev #1442 +/- ##
==========================================
+ Coverage 66.76% 68.16% +1.39%
==========================================
Files 116 119 +3
Lines 6030 6134 +104
==========================================
+ Hits 4026 4181 +155
+ Misses 1620 1559 -61
- Partials 384 394 +10
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@susanshi Here are some findings on current state of the aks test cleanup:
- The
aks-test-cleanup
job is currently set toneed
the aks-test job in order to run. Currently, theneeds
trigger by definition requires success of the job itneeds
. In our case, not all AKS tests usually succeed so we want a trigger that will only run after the AKS test if it is triggered BUT regardless if the AKS tests succeeded or failed. I looked at the docs and it makes it seem like it is possible: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-not-requiring-successful-dependent-jobs but this doesn't seem to work. I tried and withalways()
the cleanup step always runs irrespective of if AKS test ran or not which is not what we want. I also tried to add a conditionalif
that is the same as the AKS test trigger but keep theneeds
. That also does not seem to work. - The current AKS test has a
cleanup()
function that deletes the resource group upon a failure/exit. That seems to work however it may not always work? I do sometimes see that there are dangling resource groups.
I'm thinking maybe this cleanup optimization is out of scope for this PR?
verified on local fork at least one test passes:
@susanshi Here are some findings on current state of the aks test cleanup:
- The
aks-test-cleanup
job is currently set toneed
the aks-test job in order to run. Currently, theneeds
trigger by definition requires success of the job itneeds
. In our case, not all AKS tests usually succeed so we want a trigger that will only run after the AKS test if it is triggered BUT regardless if the AKS tests succeeded or failed. I looked at the docs and it makes it seem like it is possible: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-not-requiring-successful-dependent-jobs but this doesn't seem to work. I tried and withalways()
the cleanup step always runs irrespective of if AKS test ran or not which is not what we want. I also tried to add a conditionalif
that is the same as the AKS test trigger but keep theneeds
. That also does not seem to work.- The current AKS test has a
cleanup()
function that deletes the resource group upon a failure/exit. That seems to work however it may not always work? I do sometimes see that there are dangling resource groups.I'm thinking maybe this cleanup optimization is out of scope for this PR?
Thanks for the investigation. We added the cleanup job because the cleanup() function could be skipped. I agree it's not in scope of this PR, we can fix it later.