sling-cli icon indicating copy to clipboard operation
sling-cli copied to clipboard

OpenID Authentication to BigQuery on GitHub Action

Open mik-laj opened this issue 1 year ago • 7 comments

Issue Description

  • Description of the issue:

Hi. I'm migrating from my Python script that replicated data from Postgres to BigQuery to Sting, but I ran into a problem authenticating to BigQuery. Before migrating, my workflow authenticated to BigQuery, but now when I run Sling it raises error:

~ Failed to get datasets in project: pola-bi-looker
Get "https://bigquery.googleapis.com/bigquery/v2/projects/pola-bi-looker/datasets?all=false&alt=json&pageToken=&prettyPrint=false": oauth2/google/externalaccount: invalid response when retrieving subject token: Get "https://pipelinesghubeus2.actions.githubusercontent.com/tbxRzkEOIWLXAKSf1wWOGhUMoEfjMwfVc97IR1ADJONKVKyKn0/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/c1c29e0b-3a29-4cb1-ac66-b704bba0d36d/jobs/ffcfb67d-d2df-5907-4b20-d6b205d37b70/idtoken?api-version=2.0&audience=https%3A%2F%2Fiam.googleapis.com%2Fprojects%2F354540873199%2Flocations%2Fglobal%2FworkloadIdentityPools%2Fgithub%2Fproviders%2Fpola-backend-repo": context canceled

I checked that the gcloud sdk still has access to GCP, so it is not a problem on the GCP side.

I even tried to update some golang libraries, but it didn't help.

      - name: Set up GoLang
        uses: actions/setup-go@v3
        with:
          go-version: "1.21"
          cache: false

      - name: "Checkout"
        uses: actions/checkout@v4
        with:
          fetch-depth: 2
          repository: slingdata-io/sling-cli
          path: sling

      - run: |
          cd sling
          go mod edit -dropreplace='github.com/flarco/g' go.mod
          go mod edit -dropreplace='github.com/slingdata-io/sling' go.mod
          go mod edit -droprequire='github.com/slingdata-io/sling' go.mod
          go mod tidy
          go get -u golang.org/x/oauth2
          go get -u cloud.google.com/go
          go get -u cloud.google.com/go/bigquery
          go get -u cloud.google.com/go/bigtable
          go get -u cloud.google.com/go/storage
          go build -o /usr/local/bin/sling cmd/sling/*.go
          cd ..
  • Sling version (sling --version): 1.2.9

  • Operating System (linux, mac, windows): linux

  • Replication Configuration:

source: MY_POSTGRES
target: MY_BIGQUERY

defaults:
  mode: full-refresh
  object: '{target_schema}.raw__{stream_table}'
  primary_key: [id]

streams:
  public.ai_pics_aiattachment:
  public.ai_pics_aipics:
  public.bi_*:
    primary_key: []
  public.company_brand:
  public.company_company:
  public.gpc_*:
  public.pola_query:
    mode: incremental
    update_key: id
  public.pola_searchquery:
  public.pola_stats:
  public.product_product:
  public.report_attachment:
  public.report_report:
  public.users_user:
    select: ["-password"]

env:
  # Adds the _sling_loaded_at timestamp column
  SLING_LOADED_AT_COLUMN: true

  # Allows create empty tables
  SLING_ALLOW_EMPTY: TRUE
  • Log Output (please run command with -d):
$ sling conns set MY_BIGQUERY \
  type=bigquery \
  project="${GCP_PROJECT_ID}" \
  dataset="${GCP_BIGQUERY_DATASET}" \
  gc_bucket="${GCP_BUCKET_NAME}" \
  location="${GCP_REGION}"
2:06AM INF connection `MY_BIGQUERY` has been set in /home/runner/.config/.sling/env.yaml. Please test with `sling conns test MY_BIGQUERY`
$ sling conns test MY_BIGQUERY   
~ Failed to get datasets in project: pola-bi-looker
Get "https://bigquery.googleapis.com/bigquery/v2/projects/pola-bi-looker/datasets?all=false&alt=json&pageToken=&prettyPrint=false": oauth2/google/externalaccount: invalid response when retrieving subject token: Get "https://pipelinesghubeus2.actions.githubusercontent.com/XXXXXXXX/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/XXXXXXX/idtoken?api-version=2.0&audience=https%3A%2F%2Fiam.googleapis.com%2Fprojects%2F354540873199%2Flocations%2Fglobal%2FworkloadIdentityPools%2Fgithub%2Fproviders%2Fpola-backend-repo": context canceled

mik-laj avatar May 05 '24 10:05 mik-laj

Hi, thanks for the details. I’m not too familiar with OpenID. Looking here: https://cloud.google.com/docs/authentication/get-id-token#go Seems like you have to provide a URL to get the token. Is that correct? Did you have to provide a url in python?

flarco avatar May 05 '24 10:05 flarco

Here is my PR; https://github.com/KlubJagiellonski/pola-backend/pull/3799/files Here is my workflow file: https://github.com/KlubJagiellonski/pola-backend/blob/e5ca38e042d4f67a2b74ab879283b7407266fd6b/.github/workflows/bi-transfer_pola_backend_to_bq.yml

To authenticate using Open ID Connect, I used the google-github-actions/auth action as below:

      # Use Open ID Connect to authenticate to GCP
      # For details, see:
      # https://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions
      - id: 'auth'
        name: 'Authenticate to GCP'
        uses: 'google-github-actions/auth@v2'
        with:
          project_id: '${{ env.GCP_PROJECT_ID }}'
          # yamllint disable-line rule:line-length
          workload_identity_provider: 'projects/${{ env.GCP_PROJECT_NUMBER }}/locations/global/workloadIdentityPools/${{ env.GCP_IDENTITY_POOL }}/providers/${{ env.GCP_IDENTITY_PROVIDER }}'```

Open ID Connect authentication on GCP is provided by Workload Identity Federation feature. in my case, I use (Preferred) Direct Workload Identity Federation. Here is a detailed setup guide: https://github.com/google-github-actions/auth?tab=readme-ov-file#preferred-direct-workload-identity-federation

mik-laj avatar May 05 '24 18:05 mik-laj

OK, created this PR: https://github.com/slingdata-io/sling-cli/pull/290 But I am unable to test, so I'll need your help for that. The connection key audience is now accepted to create the TokenSource, which is then used as the AuthOption for creating the BQ client.

See Audience use here: https://pkg.go.dev/google.golang.org/api/idtoken#NewTokenSource I'm not sure what that audience URL should be, since it was not part of your python solution. Hopefully you can solve that part.

flarco avatar May 05 '24 22:05 flarco

Closing, feel free to open again.

flarco avatar May 11 '24 02:05 flarco

Unfortunately, it doesn't work stil..

I updated my PR to use our branch: https://github.com/KlubJagiellonski/pola-backend/pull/3799 I first run my Python script to confirm that all permissions work, and then I run your app compiled from the sources.

If you want, I can create a fork of your repository and configure identical access, and then give you access to the repository so that you can test freely. What do you think?

mik-laj avatar May 12 '24 18:05 mik-laj

I can create a fork of your repository and configure identical access, and then give you access to the repository so that you can test freely.

Yes, That would be great. Please fork from https://github.com/slingdata-io/sling-cli/pull/290

flarco avatar May 13 '24 15:05 flarco