bazel
bazel copied to clipboard
Bazel 5.2 Google Cloud's Workload identity federation auth seems broken
Description of the bug:
Bazel 5.2 updated to the Google Auth library, which supports Workload identity federation, useful for keyless authentication from pipelines. This can be verified in https://github.com/bazelbuild/bazel/pull/15383. However, when providing the credentials file through the google_credentials flag:
bazel build //... \
--remote_cache <cache-url> \
--google_credentials=${{ steps.auth.outputs.credentials_file_path }}
Bazel just throws an error:
Caused by: java.lang.IllegalArgumentException: Can not set java.util.List field com.google.api.client.http.HttpHeaders.authorization to java.lang.String
at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(Unknown Source)
at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(Unknown Source)
at java.base/jdk.internal.reflect.UnsafeObjectFieldAccessorImpl.set(Unknown Source)
at java.base/java.lang.reflect.Field.set(Unknown Source)
at com.google.api.client.util.FieldInfo.setFieldValue(FieldInfo.java:245)
at com.google.api.client.util.FieldInfo.setValue(FieldInfo.java:206)
at com.google.api.client.util.GenericData.set(GenericData.java:125)
at com.google.api.client.http.HttpHeaders.set(HttpHeaders.java:175)
at com.google.api.client.http.HttpHeaders.set(HttpHeaders.java:58)
at com.google.api.client.util.GenericData.putAll(GenericData.java:138)
at com.google.auth.oauth2.IdentityPoolCredentials.getSubjectTokenFromMetadataServer(IdentityPoolCredentials.java:233)
at com.google.auth.oauth2.IdentityPoolCredentials.retrieveSubjectToken(IdentityPoolCredentials.java:188)
at com.google.auth.oauth2.IdentityPoolCredentials.refreshAccessToken(IdentityPoolCredentials.java:169)
at com.google.auth.oauth2.OAuth2Credentials$1.call(OAuth2Credentials.java:257)
at com.google.auth.oauth2.OAuth2Credentials$1.call(OAuth2Credentials.java:254)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.auth.oauth2.OAuth2Credentials$AsyncRefreshResult.executeIfNew(OAuth2Credentials.java:580)
at com.google.auth.oauth2.OAuth2Credentials.asyncFetch(OAuth2Credentials.java:220)
at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:170)
at com.google.auth.oauth2.ExternalAccountCredentials.getRequestMetadata(ExternalAccountCredentials.java:292)
at com.google.devtools.build.lib.remote.http.AbstractHttpHandler.addCredentialHeaders(AbstractHttpHandler.java:73)
at com.google.devtools.build.lib.remote.http.HttpDownloadHandler.write(HttpDownloadHandler.java:141)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:306)
at com.google.devtools.build.lib.remote.http.HttpCacheClient.lambda$get$6(HttpCacheClient.java:496)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
...
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
bazel build //... \
--remote_cache <cache-url> \
--google_credentials=${{ steps.auth.outputs.credentials_file_path }}
Which operating system are you running Bazel on?
Linux on Github Actions
What is the output of bazel info release?
5.2.0
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?
No response
Have you found anything relevant by searching the web?
https://github.com/bazelbuild/bazel/issues/14278
Any other information, logs, or outputs that you want to share?
No response
@coeuvre I see you made the cherry-pick adding the related PR to Bazel 5.2 in the first place. Maybe you have a clue about what is wrong?
@bazaglia I'd like to look into this, but reproducing it seems to be quite involved. Do you have a repro that does not require setting up a GitHub action? I wonder if a fake credentials file (i.e., with sensitive data replaced by random strings) is sufficient to trigger the issue.
@tjgq if you're interested I can set you up a GH repository to reproduce this pretty easily.
I was also able to reproduce this today as well on my production repo. I believe my instructions from https://github.com/bazelbuild/bazel/issues/14278 will still reproduce it with minimal effort.
The difficult part for me isn't setting up the GitHub repository, it's configuring the GCP workload identity provider: the google.com GCP org policy forbids me from using https://token.actions.githubusercontent.com as the issuer URI. I'd probably need to set up a separate GCP org, but that's going to require a lot more steps that I'm not familiar with.
I do have a working theory, though: in #15176 we upgraded google-auth-library-oauth2-http to 1.6.0, but its dependencies google-http-client and google-http-client-gson were kept at 1.22.0. According to Maven, the minimum required version is 1.41.1 (which in turn requires an additional dependency on opencensus-contrib-http-util 0.31.0). This strongly correlates with the stack trace above.
@kylekurz Are you able to build Bazel with PR #16082 and let me know if you can still repro?
@tjgq I will give this a shot. Might not get to it until tomorrow though.
@tjgq sorry for the delay, been fighting migraines for a week. It doesn't look like that branch fixes this:
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:646)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Can not set java.util.List field com.google.api.client.http.HttpHeaders.authorization to java.lang.String
at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(Unknown Source)
at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(Unknown Source)
at java.base/jdk.internal.reflect.UnsafeObjectFieldAccessorImpl.set(Unknown Source)
at java.base/java.lang.reflect.Field.set(Unknown Source)
Let me know if I can get you more information. I built off the tip of your branch this morning.
EDIT: I did do a second run and dumped the credentials file, so I'm not passing a broken path to Bazel.
I was able to repro this today. It looks like there's a bug in the google-auth-library-oauth2-http library. I've sent https://github.com/googleapis/google-auth-library-java/pull/984 to fix it.
I'm no longer convinced there's a bug in google-auth-library-oauth2-http. The test case I added in googleapis/google-auth-library-java#984 passes even without the fix (as the maintainer pointed out).
I'm fairly sure PR #16082 was the right fix all along. I've just managed to run a GitHub action successfully with WIF using a Bazel built at that PR.
@tjgq does that mean my build of your branch was wrong? I definitely didn't get a successful WIF run using that, but I can try again if you'd like.
How exactly are you building and running Bazel? In particular, how does the built Bazel make it into the GitHub action execution environment?
I have a GHA runner I manage in GCP so I can have local cache for some runs, so I just built the binary (on that machine) and called it directly from there instead of using the bazelisk wrapper.
Ok, so here's how I verified that it works for me:
- I checked out the https://github.com/tjgq/bazel/tree/auth branch and built a Bazel with
bazel build //src:bazelat commit a416cea. - I copied the built
bazelinto a minimal repository containing a GitHub action that I set up according to the instructions at https://github.com/bazelbuild/bazel/issues/14278#issue-1053826946. Note that my action directly runs thetools/bazelbinary I checked into the repo. - I created a pull request and verified that the action runs to completion: https://github.com/tjgq/wif-repro/runs/8037790988
I've also confirmed that I get the reported crash if I check in a Bazel binary built without the changes in my PR.
One thing you might want to try is grab the credentials JSON file and run the Bazel binary locally (to take some complexity out of the equation). I'm not sure that these credentials can be reused across build requests, but at least you seem to get Bazel to report a different error (I got something like a 401 Unauthorized when I tried).
Is this going to be included in a release soon?
It will definitely be included in 6.0, but I'm reluctant about backporting it to 5.3.1. There's a lot of complexity in the interaction between Bazel and the OAuth2 support libraries, and we could very easily introduce other bugs.
@tjgq so I'm still not entirely sure what I did wrong building your branch, but I think I agree that your fix works. I took the binary in your test repo and put it on my CI machine, then ran a job that used it and it worked perfectly. Thanks for your research here, I will be watching for when this hits a released version of Bazel!
FYI, I'm going to backport this into 5.4.0 because I got a report of another user running into an issue related to this.