azkv: Allow specifying auth method and add cachable authentication methods
Azure is just really awful and slow to work with, and using sops with a azkv key to edit secrets from e.g. your laptop where you are authenticating with your user and not through a SP/MSI is nightmarishly slow.
To improve the situation this PR implements:
- Allow explicitly specifying auth method through environement variable, this bypass the need to go through the default authentication chain.
- Adds two cachable auth methods
InteractiveBrowserCredentialandDeviceCodeCredentialwhich adds a very significant speedup compared toAzureCliCredential. See below.
The time to unlock a secret using sops v3.9.4 and authenticating as a user through azure-cli:
time sops -d ./path/to/secret
4.77s user 1.17s system 82% cpu 7.217 total
The time to unlock a secret using my PR and authenticating as a user with InteractiveBrowserCredential after the initial authentication:
> time SOPS_AZURE_AUTH_METHOD="BROWSER" sops -d ./path/to/secret
0.08s user 0.05s system 25% cpu 0.511 total
Related issues: #885, #1606
I'm happy to add documentation, tests or whatever is needed. I just don't want to go through the trouble before I know if the design is acceptable, and that there is a chance that this could be merged.
Thanks for your contribution! I've been looking at this and thinking about this since yesterday, and I think the general idea is sound. (I'm not an Azure user though... I only use AZP for another open source project, and if you're writing that Azure is awful and slow to work with, I can guess what you mean since AZP definitely also falls into that category :D)
I'm a bit hesitant on the caching parts, since that writes credentials to disk (I guess not in an encrypted form, and even if its encrypted, it likely won't be anything safe). This should definitely be explicitly mentioned in the documentation, and maybe it would be better to rename the corresponding values for SOPS_AZURE_AUTH_METHOD to include "caching on disk" (I first just wanted to suggest to include that it's caching, but then I thought that later one might want to cache it somewhere else, like in the OS'es keyring, or some other agent, so including the method where it's cached is probably a good idea). WDYT?
I also have some smaller comments, I'm adding them here (even though they're not relevant to the general design) before I forget about them:
- Please undo the changes in go.mod to
goandtoolchain, that's likely the reason that the builds fails in CI. - I'd probably move most of the new functions to a separate file, say
azkv/credentials.go. - Is there a reason the environment variable values are upper-case? I think lower-case is more common. (They also used to be lower-case with the
AZURE_AUTH_METHODenv variable that was removed in #1067.) - The error message in the switch's default case should mention that it's an environment variable. Someone might have forgotten that the environment variable is set and be wondering where to find this
SOPS_AZURE_AUTH_METHODsetting. - For the cached credential filenames, I'd use dashes instead of underscores.
- Generally I'd move the strings used (env variable values, cache filenames) into constants.
CC @hiddeco, since you created #1067 which removed the AZURE_AUTH_METHOD environment variable.
(And I think it would be better if Azure's SDK would take care of things like this in its default auth flow, or would offer another auth flow that allows the user to configure this via env variables. But if it doens't offer that, I guess we have to do it ourselves...)
The AuthenticationRecord that I am storing in the function cacheStoreRecord is not actually a secret, it only contains information like identifiers, version, username, etc.
The actual storing of the token is handled by the azure-sdk, the tokens are encrypted at rest via an os specific mechanisms:
Persistent caches are encrypted at rest using a mechanism that depends on the operating system see docs
- Linux: kernel key retention service (keyctl)
- Macos: Keychain (requires cgo and native build tools)
- Windows: Data Protection API (DPAPI)
Their respective implementations can be found here:
- Linux: https://github.com/Azure/azure-sdk-for-go/blob/sdk/azidentity/cache/v0.3.2/sdk/azidentity/cache/linux.go
- Windows: https://github.com/AzureAD/microsoft-authentication-extensions-for-go/blob/main/cache/accessor/windows.go
- Macos: https://github.com/AzureAD/microsoft-authentication-extensions-for-go/blob/main/cache/accessor/darwin.go
This should definitely be explicitly mentioned in the documentation, and maybe it would be better to rename the corresponding values for SOPS_AZURE_AUTH_METHOD to include "caching on disk" (I first just wanted to suggest to include that it's caching, but then I thought that later one might want to cache it somewhere else, like in the OS'es keyring, or some other agent, so including the method where it's cached is probably a good idea). WDYT?
Since the token is encrypted using a method specfic to the OS it makes it difficult to think of what to add as a contextual information in SOPS_AZURE_AUTH_METHOD, the best I can come up with is: browser_cache_in_keyring and device_code_cache_in_keyring. Does that seem fine?
I'll submit an update in the next few days with the fixes to the comments, adding some docs and tests (if I can figure something out to get around the interactive parts).
I just have one question: should I keep the environment variable SOPS_AZURE_AUTH_METHOD or should I change it back to AZURE_AUTH_METHOD? The context for me changing it is that it was not clear to me when I was using AZURE_AUTH_METHOD that it was a sops specific variable until digging through the code. That's why I figured it might be good to make that clear by prefixing it, kind of like SOPS_AGE_KEY is prefixed with sops. I'm happy to defer to you judgement on this though.
The AuthenticationRecord that I am storing in the function cacheStoreRecord is not actually a secret, it only contains information like identifiers, version, username, etc.
The actual storing of the token is handled by the azure-sdk, the tokens are encrypted at rest via an os specific mechanisms:
Persistent caches are encrypted at rest using a mechanism that depends on the operating system see docs
Thanks for clarifying that! In that case, I don't mind the caching on disk.
This should definitely be explicitly mentioned in the documentation, and maybe it would be better to rename the corresponding values for SOPS_AZURE_AUTH_METHOD to include "caching on disk" (I first just wanted to suggest to include that it's caching, but then I thought that later one might want to cache it somewhere else, like in the OS'es keyring, or some other agent, so including the method where it's cached is probably a good idea). WDYT?
Since the token is encrypted using a method specfic to the OS it makes it difficult to think of what to add as a contextual information in
SOPS_AZURE_AUTH_METHOD, the best I can come up with is:browser_cache_in_keyringanddevice_code_cache_in_keyring. Does that seem fine?
Since no sensitive information is cached, I think it's OK to not use a prefix, or simply use cached- or something like that (without more details).
I just have one question: should I keep the environment variable
SOPS_AZURE_AUTH_METHODor should I change it back toAZURE_AUTH_METHOD? The context for me changing it is that it was not clear to me when I was usingAZURE_AUTH_METHODthat it was a sops specific variable until digging through the code. That's why I figured it might be good to make that clear by prefixing it, kind of likeSOPS_AGE_KEYis prefixed with sops. I'm happy to defer to you judgement on this though.
I would keep the prefix. I think using env variables without a SOPS_ prefix is only a good idea if they use same convention as the tool they hint at (which is usually best handled by a library provided by that tool). Also the new environment variable uses different values and semantics than the previous one, so its previous existence is no reason to re-use that name.
I've updated the minor comments and added docs for the feature. However I am not sure what to do about the go and toolchain version in go.mod file. This auto upgrading behaviour of the go toolchain is a bit new to me.
If I run go mod tidy with the toolchain and go version that is specified in the go.mod file it gives me this error:
> GOTOOLCHAIN=go1.23.6 go mod tidy -go=1.22
go: cloud.google.com/go/[email protected] requires [email protected], but 1.22 is requested
If I run go mod tidy with the toolchain as the latest go1.22 patch release it gives me this error:
> GOTOOLCHAIN=go1.22.12 go mod tidy
go: golang.org/x/[email protected] requires go >= 1.23.0 (running go 1.22.12; GOTOOLCHAIN=go1.22.12)
If I run go mod tidy with the toolchain that is specified in the go.mod but don't specify -go=<version> it succeeds but it modifies the go version in the go.mod file.
GOTOOLCHAIN=go1.23.6 go mod tidy
Please note that you must sign off every single commit, otherwise we cannot merge this (see the failing DCO test).
Please note that you must sign off every single commit, otherwise we cannot merge this (see the failing DCO test). I've rebased all the changes to a single commit, will sign the DCO.
Since the main branch has updated the go version in the go.mod file that also fixed the issues I was having where go wanted to upgrade to 1.23. So hopefully the tests will pass now.
Since we've been using the forked version for a little while now I've realized that on the initial authentication the browser and device-code login auth methods would pollute the stdout, I've provided a fix to this issue in this PR.
To actually get this into a shape where it is mergable, changes have to be made to build the macOS binaries with CGO enabled. This will likely raise some issues around cross-compilation. If we go the CGO route, we should probably plan out how this affects our CI, release process, and broader contributor experience.
Ah, it was CGO that was the issue! I have been maintaining a fork with this change for the past 9 months now which we use in my team, several people in my team are mac users so I've been very confused why it refused to build here in the CI. But it's because we build it with nix which defaults to CGO.
I've spent a fair bit of time today trying to get this cross compiling and the only method so far that actually built the thing was using zig. Would using the c compiler and linker from zig to build the cross compiled version be an acceptable solution?
Otherwise using macos-latest seems to work out of the box for building the darwin/arm64 and darwin/amd64 targets. The tests fail but you are not actually running the tests for darwin right now either.
wdyt? @hiddeco