velero-plugin-for-microsoft-azure
velero-plugin-for-microsoft-azure copied to clipboard
Don't use storage account key for object storage if SP or MI is provided
Implements https://github.com/vmware-tanzu/velero/issues/4267
The plugin no longer uses the storage access key to access the blobs by default. The key is only used if it's configured via AZURE_STORAGE_ACCOUNT_ACCESS_KEY
. Therefore "Allow storage account key access" can be disabled on the storage account if using MI or SP.
Changes:
- Switched object_store to the newer azblobclient
- When the storage account access key is not provided delegation SAS instead of service SAS is used.
- delegation SAS isn't implemented in the azblob client yet so I implemented a simple version in
delegationsas.go
. This code can be delete once it's implemented in azblob - resourceGroupConfigKey and subscriptionIDConfigKey are no longer needed for the object store
- delegation SAS isn't implemented in the azblob client yet so I implemented a simple version in
- Added an integration test for the object_storage
- Only runs when the tag
integration
is specified e.g.go test ./... -tags=integration -v
- The storage account for the test has to be created separatly.
- Only runs when the tag
- Updated README
- Updated deps
- go 1.17
- go dependencies
- busybox image
I downgraded it to 1.16 and also added a go setup step to pr.yaml
, it seems it was using the preinstalled version before.
Hi!
Has someone time to review this? BTW: is there a way to run integration tests against Azure in this repo?
Thanks, Yves
Rebased to master. Any chance this can be merged?
This is exactly what we need as well and makes a lot more sense than dealing with access keys when they aren't necessary.
Codecov Report
Merging #111 (e7efb9e) into main (73d1530) will increase coverage by
1.29%
. The diff coverage is7.00%
.
@@ Coverage Diff @@
## main #111 +/- ##
==========================================
+ Coverage 12.77% 14.07% +1.29%
==========================================
Files 4 4
Lines 626 668 +42
==========================================
+ Hits 80 94 +14
- Misses 542 571 +29
+ Partials 4 3 -1
Impacted Files | Coverage Δ | |
---|---|---|
velero-plugin-for-microsoft-azure/object_store.go | 2.38% <0.54%> (-0.27%) |
:arrow_down: |
...o-plugin-for-microsoft-azure/volume_snapshotter.go | 23.68% <4.44%> (+0.28%) |
:arrow_up: |
velero-plugin-for-microsoft-azure/common.go | 24.59% <55.55%> (+24.59%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Does the PR also addresses this issue? https://github.com/vmware-tanzu/velero/issues/4930
Does the PR also addresses this issue? vmware-tanzu/velero#4930
Yes, ListKeys will be removed with this PR.
Awesome! Thank you for your contribution.
Reliance on Azure storage account shared keys is preventing my org from using Velero, would be great to see this merged :heart: @ywk253100 @eleanor-millman
HI @yvespp I was trying to test out this PR 1 basic flow I was not able to validate - I created a BSL with incorrect params(SA didn't exist) But the BSL is showing as available. Possibly some underlying layers are not returning errors? The logs also did not help much debugging
When I created a backup, the following error came in the Backup Status. The BSL was still showing as available
│ completionTimestamp: "2022-09-08T05:12:03Z" ││ expiration: "2022-10-08T05:12:03Z" ││ failureReason: | ││ error checking if backup already exists in object storage: rpc error: code = Unknown desc = ===== RESPONSE ERROR (ErrorCode=AuthorizationPermissionMismatch) ===== ││ Description=, Details: (none) ││ formatVersion: 1.1.0 ││ phase: Failed ││ startTimestamp: "2022-09-08T05:12:03Z" ││ version: 1
@yvespp Sorry for the late review.
I have added some comments but haven't reviewed the delegationsas.go yet and I will continue to add comments.
Hi! Would be really great if this could be merged sometime soon :) @sseago @ywk253100 @dsu-igeek
I second that. This PR will allow me to start using Velero.
HI @yvespp I was trying to test out this PR 1 basic flow I was not able to validate - I created a BSL with incorrect params(SA didn't exist) But the BSL is showing as available. Possibly some underlying layers are not returning errors? The logs also did not help much debugging
When I created a backup, the following error came in the Backup Status. The BSL was still showing as available
│ completionTimestamp: "2022-09-08T05:12:03Z" ││ expiration: "2022-10-08T05:12:03Z" ││ failureReason: | ││ error checking if backup already exists in object storage: rpc error: code = Unknown desc = ===== RESPONSE ERROR (ErrorCode=AuthorizationPermissionMismatch) ===== ││ Description=, Details: (none) ││ formatVersion: 1.1.0 ││ phase: Failed ││ startTimestamp: "2022-09-08T05:12:03Z" ││ version: 1
Hi @anshulahuja98, do you know how the backup location is verified? I was looking through the code but couldn't find it.
@anshulahuja98 the bsl verification should work correctly now. ListCommonPrefixes and ListObjects was returning the wrong error. However, the error is a bit confusing as a 401 (RESPONSE 401: 401 Server failed to authenticate the request
) is returned instead of a 404.
@anshulahuja98 the bsl verification should work correctly now. ListCommonPrefixes and ListObjects was returning the wrong error. However, the error is a bit confusing as a 401 (
RESPONSE 401: 401 Server failed to authenticate the request
) is returned instead of a 404.
Great that BSL validation would be fixed.
With regards to error - yes it has become much more confusing now. Earlier it was easier to decode information from the error message - for permissions the client ID and the target principal ID were explicitly mentioned which made permissions fixes straightforward. I am assuming that with this route - we might not be able to get verbose error messages
Would be great if we could still find some way to make it verbose.
@yvespp The object store uses the new version of Azure SDK to do the auth and operations in this PR, but the snapshotter still uses the old one. Is it possible to update the snapshotter to use the new SDK as well?
@yvespp The object store uses the new version of Azure SDK to do the auth and operations in this PR, but the snapshotter still uses the old one. Is it possible to update the snapshotter to use the new SDK as well?
I moved the snapshotter to the new sdk now and the go-autorest was completely remove. I couldn't test this and it would be good if some else could verify that it works.
@yvespp I have finished all the reviewing and thanks very much for the contribution. I'll trigger end-to-end testing to test the new plugin and will let you know the result
The BSL validation still doesn't work with the error:
RESPONSE 403: 403 This request is not authorized to perform this operation using this permission.
ERROR CODE: AuthorizationPermissionMismatch
And I find another issue: as we change the needed permissions for the customized role, if users use the customized role, this introduces break change for them
And I find another issue: as we change the needed permissions for the customized role, if users use the customized role, this introduces break change for them
It will be a breaking change even if a custom role isn't used. To write to a blob storage using MI oder SP the role Storage Blob Data Contributor
is needed, Contributor
isn't enough. I added a changelog to document this.
In the last commit I also updated the docs with the required roles.
So this is the reason that I got the AuthorizationPermissionMismatch
error as I still use the old credential with only Contributor
role.
I'm afraid this breaking change isn't acceptable, this impacts all upgraded users.
We should re-consider the changes and as we're very close to the 1.10 FC date, this PR may be out of the scope of 1.10 if we have no better solution.
So this is the reason that I got the
AuthorizationPermissionMismatch
error as I still use the old credential with onlyContributor
role.I'm afraid this breaking change isn't acceptable, this impacts all upgraded users.
We should re-consider the changes and as we're very close to the 1.10 FC date, this PR may be out of the scope of 1.10 if we have no better solution.
One option would be to fall back to fetching the shared key if accessing the blobs using the default credential fails. But because the plugin doesn't know the container at init time to test access, the client creation has to be delayed until the object store is used. I'll see if I can implement that without making the code too complicated.
I think it would still make sense to clean up the permissions at some point and remove this compatibility code. My question is when it would be allowed to have that breaking change?
So this is the reason that I got the
AuthorizationPermissionMismatch
error as I still use the old credential with onlyContributor
role. I'm afraid this breaking change isn't acceptable, this impacts all upgraded users. We should re-consider the changes and as we're very close to the 1.10 FC date, this PR may be out of the scope of 1.10 if we have no better solution.One option would be to fall back to fetching the shared key if accessing the blobs using the default credential fails. But because the plugin doesn't know the container at init time to test access, the client creation has to be delayed until the object store is used. I'll see if I can implement that without making the code too complicated.
I think it would still make sense to clean up the permissions at some point and remove this compatibility code. My question is when it would be allowed to have that breaking change?
I went a different route. I added an optional config parameter disableStorageAccountKeyFetch
wich is set to "false" by default. If it is "false" the object store will work as before: using the SP or MI only to fetch the storage account key and then using that to access the storage account. This requires no additional permissions and won't break users.
If it is set to "true" the storage account key won't be used but new permissions are required.
I documented the change with the recommendation to set disableStorageAccountKeyFetch
to "true", so new users will not use the old behaviour. In a future release the old behaviour and the option removed.
@yvespp Thanks for the quick update. I think we need a new major version(v2.x) to introduce such kinds of breaking changes, so it may be a long time before we remove the backward compatibility codes.
I'll take a look at the new changes and test them in my local env and run the E2E test, will let you know the result.
@yvespp I'm a little confused about the Azure roles. The Contributor
role almost has all the permissions why do we still need the Storage Blob Data Contributor
role, is there any Azure doc for this part?
@yvespp I'm a little confused about the Azure roles. The
Contributor
role almost has all the permissions why do we still need theStorage Blob Data Contributor
role, is there any Azure doc for this part?
The clearest doc about this I found is here: Authorize access to blobs using Azure Active Directory
Only roles explicitly defined for data access permit a security principal to access blob data. Built-in roles such as Owner, Contributor, and Storage Account Contributor permit a security principal to manage a storage account, but do not provide access to the blob data within that account via Azure AD. However, if a role includes Microsoft.Storage/storageAccounts/listKeys/action, then a user to whom that role is assigned can access data in the storage account via Shared Key authorization with the account access keys. For more information, see Choose how to authorize access to blob data in the Azure portal.
I think about it this way: Contributor
is for the control plane and Storage Blob Data Contributor
for the data plane.
I removed disableStorageAccountKeyFetch again and the storage access key is used only used if subscriptionId and resourceGroup is set. I think this is easier to configure as users don't need to add and then in a future release remove the disableStorageAccountKeyFetch parameter again.
@yvespp Thanks for the quick update.
I'm working on some other priorities and have no bandwidth to review the new changes in the next few days. So it cannot meet the FC date(early next week), I'll come back to review it again before RC, but as the changes introduced by this PR are very fundamental it's better to run enough tests for it, so if we cannot do enough review and test before the RC, I'd like to move it to the 1.11 release and make it merged in the early stage of 1.11.