bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Bazel 5.2 Google Cloud's Workload identity federation auth seems broken

Open bazaglia opened this issue 3 years ago • 4 comments

Description of the bug:

Bazel 5.2 updated to the Google Auth library, which supports Workload identity federation, useful for keyless authentication from pipelines. This can be verified in https://github.com/bazelbuild/bazel/pull/15383. However, when providing the credentials file through the google_credentials flag:

bazel build //... \
  --remote_cache <cache-url> \
  --google_credentials=${{ steps.auth.outputs.credentials_file_path }}

Bazel just throws an error:

Caused by: java.lang.IllegalArgumentException: Can not set java.util.List field com.google.api.client.http.HttpHeaders.authorization to java.lang.String
	at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(Unknown Source)
	at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(Unknown Source)
	at java.base/jdk.internal.reflect.UnsafeObjectFieldAccessorImpl.set(Unknown Source)
	at java.base/java.lang.reflect.Field.set(Unknown Source)
	at com.google.api.client.util.FieldInfo.setFieldValue(FieldInfo.java:245)
	at com.google.api.client.util.FieldInfo.setValue(FieldInfo.java:206)
	at com.google.api.client.util.GenericData.set(GenericData.java:125)
	at com.google.api.client.http.HttpHeaders.set(HttpHeaders.java:175)
	at com.google.api.client.http.HttpHeaders.set(HttpHeaders.java:58)
	at com.google.api.client.util.GenericData.putAll(GenericData.java:138)
	at com.google.auth.oauth2.IdentityPoolCredentials.getSubjectTokenFromMetadataServer(IdentityPoolCredentials.java:233)
	at com.google.auth.oauth2.IdentityPoolCredentials.retrieveSubjectToken(IdentityPoolCredentials.java:188)
	at com.google.auth.oauth2.IdentityPoolCredentials.refreshAccessToken(IdentityPoolCredentials.java:169)
	at com.google.auth.oauth2.OAuth2Credentials$1.call(OAuth2Credentials.java:257)
	at com.google.auth.oauth2.OAuth2Credentials$1.call(OAuth2Credentials.java:254)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
	at com.google.auth.oauth2.OAuth2Credentials$AsyncRefreshResult.executeIfNew(OAuth2Credentials.java:580)
	at com.google.auth.oauth2.OAuth2Credentials.asyncFetch(OAuth2Credentials.java:220)
	at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:170)
	at com.google.auth.oauth2.ExternalAccountCredentials.getRequestMetadata(ExternalAccountCredentials.java:292)
	at com.google.devtools.build.lib.remote.http.AbstractHttpHandler.addCredentialHeaders(AbstractHttpHandler.java:73)
	at com.google.devtools.build.lib.remote.http.HttpDownloadHandler.write(HttpDownloadHandler.java:141)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808)
	at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
	at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:306)
	at com.google.devtools.build.lib.remote.http.HttpCacheClient.lambda$get$6(HttpCacheClient.java:496)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
...

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

bazel build //... \
  --remote_cache <cache-url> \
  --google_credentials=${{ steps.auth.outputs.credentials_file_path }}

Which operating system are you running Bazel on?

Linux on Github Actions

What is the output of bazel info release?

5.2.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

https://github.com/bazelbuild/bazel/issues/14278

Any other information, logs, or outputs that you want to share?

No response

bazaglia avatar Jun 08 '22 08:06 bazaglia

@coeuvre I see you made the cherry-pick adding the related PR to Bazel 5.2 in the first place. Maybe you have a clue about what is wrong?

bazaglia avatar Jun 08 '22 15:06 bazaglia

@bazaglia I'd like to look into this, but reproducing it seems to be quite involved. Do you have a repro that does not require setting up a GitHub action? I wonder if a fake credentials file (i.e., with sensitive data replaced by random strings) is sufficient to trigger the issue.

tjgq avatar Aug 09 '22 13:08 tjgq

@tjgq if you're interested I can set you up a GH repository to reproduce this pretty easily.

russellhaering avatar Aug 09 '22 16:08 russellhaering

I was also able to reproduce this today as well on my production repo. I believe my instructions from https://github.com/bazelbuild/bazel/issues/14278 will still reproduce it with minimal effort.

kylekurz avatar Aug 09 '22 16:08 kylekurz

The difficult part for me isn't setting up the GitHub repository, it's configuring the GCP workload identity provider: the google.com GCP org policy forbids me from using https://token.actions.githubusercontent.com as the issuer URI. I'd probably need to set up a separate GCP org, but that's going to require a lot more steps that I'm not familiar with.

I do have a working theory, though: in #15176 we upgraded google-auth-library-oauth2-http to 1.6.0, but its dependencies google-http-client and google-http-client-gson were kept at 1.22.0. According to Maven, the minimum required version is 1.41.1 (which in turn requires an additional dependency on opencensus-contrib-http-util 0.31.0). This strongly correlates with the stack trace above.

@kylekurz Are you able to build Bazel with PR #16082 and let me know if you can still repro?

tjgq avatar Aug 10 '22 15:08 tjgq

@tjgq I will give this a shot. Might not get to it until tomorrow though.

kylekurz avatar Aug 10 '22 19:08 kylekurz

@tjgq sorry for the delay, been fighting migraines for a week. It doesn't look like that branch fixes this:

	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:646)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Can not set java.util.List field com.google.api.client.http.HttpHeaders.authorization to java.lang.String
	at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(Unknown Source)
	at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(Unknown Source)
	at java.base/jdk.internal.reflect.UnsafeObjectFieldAccessorImpl.set(Unknown Source)
	at java.base/java.lang.reflect.Field.set(Unknown Source)

Let me know if I can get you more information. I built off the tip of your branch this morning.

EDIT: I did do a second run and dumped the credentials file, so I'm not passing a broken path to Bazel.

kylekurz avatar Aug 16 '22 14:08 kylekurz

I was able to repro this today. It looks like there's a bug in the google-auth-library-oauth2-http library. I've sent https://github.com/googleapis/google-auth-library-java/pull/984 to fix it.

tjgq avatar Aug 19 '22 17:08 tjgq

I'm no longer convinced there's a bug in google-auth-library-oauth2-http. The test case I added in googleapis/google-auth-library-java#984 passes even without the fix (as the maintainer pointed out).

I'm fairly sure PR #16082 was the right fix all along. I've just managed to run a GitHub action successfully with WIF using a Bazel built at that PR.

tjgq avatar Aug 26 '22 10:08 tjgq

@tjgq does that mean my build of your branch was wrong? I definitely didn't get a successful WIF run using that, but I can try again if you'd like.

kylekurz avatar Aug 26 '22 12:08 kylekurz

How exactly are you building and running Bazel? In particular, how does the built Bazel make it into the GitHub action execution environment?

tjgq avatar Aug 26 '22 13:08 tjgq

I have a GHA runner I manage in GCP so I can have local cache for some runs, so I just built the binary (on that machine) and called it directly from there instead of using the bazelisk wrapper.

kylekurz avatar Aug 26 '22 13:08 kylekurz

Ok, so here's how I verified that it works for me:

  • I checked out the https://github.com/tjgq/bazel/tree/auth branch and built a Bazel with bazel build //src:bazel at commit a416cea.
  • I copied the built bazel into a minimal repository containing a GitHub action that I set up according to the instructions at https://github.com/bazelbuild/bazel/issues/14278#issue-1053826946. Note that my action directly runs the tools/bazel binary I checked into the repo.
  • I created a pull request and verified that the action runs to completion: https://github.com/tjgq/wif-repro/runs/8037790988

I've also confirmed that I get the reported crash if I check in a Bazel binary built without the changes in my PR.

One thing you might want to try is grab the credentials JSON file and run the Bazel binary locally (to take some complexity out of the equation). I'm not sure that these credentials can be reused across build requests, but at least you seem to get Bazel to report a different error (I got something like a 401 Unauthorized when I tried).

tjgq avatar Aug 26 '22 13:08 tjgq

Is this going to be included in a release soon?

jbms avatar Sep 01 '22 17:09 jbms

It will definitely be included in 6.0, but I'm reluctant about backporting it to 5.3.1. There's a lot of complexity in the interaction between Bazel and the OAuth2 support libraries, and we could very easily introduce other bugs.

tjgq avatar Sep 02 '22 09:09 tjgq

@tjgq so I'm still not entirely sure what I did wrong building your branch, but I think I agree that your fix works. I took the binary in your test repo and put it on my CI machine, then ran a job that used it and it worked perfectly. Thanks for your research here, I will be watching for when this hits a released version of Bazel!

kylekurz avatar Sep 13 '22 14:09 kylekurz

FYI, I'm going to backport this into 5.4.0 because I got a report of another user running into an issue related to this.

tjgq avatar Nov 11 '22 13:11 tjgq