sling-cli
sling-cli copied to clipboard
OpenID Authentication to BigQuery on GitHub Action
Issue Description
- Description of the issue:
Hi. I'm migrating from my Python script that replicated data from Postgres to BigQuery to Sting, but I ran into a problem authenticating to BigQuery. Before migrating, my workflow authenticated to BigQuery, but now when I run Sling it raises error:
~ Failed to get datasets in project: pola-bi-looker
Get "https://bigquery.googleapis.com/bigquery/v2/projects/pola-bi-looker/datasets?all=false&alt=json&pageToken=&prettyPrint=false": oauth2/google/externalaccount: invalid response when retrieving subject token: Get "https://pipelinesghubeus2.actions.githubusercontent.com/tbxRzkEOIWLXAKSf1wWOGhUMoEfjMwfVc97IR1ADJONKVKyKn0/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/c1c29e0b-3a29-4cb1-ac66-b704bba0d36d/jobs/ffcfb67d-d2df-5907-4b20-d6b205d37b70/idtoken?api-version=2.0&audience=https%3A%2F%2Fiam.googleapis.com%2Fprojects%2F354540873199%2Flocations%2Fglobal%2FworkloadIdentityPools%2Fgithub%2Fproviders%2Fpola-backend-repo": context canceled
I checked that the gcloud sdk still has access to GCP, so it is not a problem on the GCP side.
I even tried to update some golang libraries, but it didn't help.
- name: Set up GoLang
uses: actions/setup-go@v3
with:
go-version: "1.21"
cache: false
- name: "Checkout"
uses: actions/checkout@v4
with:
fetch-depth: 2
repository: slingdata-io/sling-cli
path: sling
- run: |
cd sling
go mod edit -dropreplace='github.com/flarco/g' go.mod
go mod edit -dropreplace='github.com/slingdata-io/sling' go.mod
go mod edit -droprequire='github.com/slingdata-io/sling' go.mod
go mod tidy
go get -u golang.org/x/oauth2
go get -u cloud.google.com/go
go get -u cloud.google.com/go/bigquery
go get -u cloud.google.com/go/bigtable
go get -u cloud.google.com/go/storage
go build -o /usr/local/bin/sling cmd/sling/*.go
cd ..
-
Sling version (
sling --version): 1.2.9 -
Operating System (
linux,mac,windows): linux -
Replication Configuration:
source: MY_POSTGRES
target: MY_BIGQUERY
defaults:
mode: full-refresh
object: '{target_schema}.raw__{stream_table}'
primary_key: [id]
streams:
public.ai_pics_aiattachment:
public.ai_pics_aipics:
public.bi_*:
primary_key: []
public.company_brand:
public.company_company:
public.gpc_*:
public.pola_query:
mode: incremental
update_key: id
public.pola_searchquery:
public.pola_stats:
public.product_product:
public.report_attachment:
public.report_report:
public.users_user:
select: ["-password"]
env:
# Adds the _sling_loaded_at timestamp column
SLING_LOADED_AT_COLUMN: true
# Allows create empty tables
SLING_ALLOW_EMPTY: TRUE
- Log Output (please run command with
-d):
$ sling conns set MY_BIGQUERY \
type=bigquery \
project="${GCP_PROJECT_ID}" \
dataset="${GCP_BIGQUERY_DATASET}" \
gc_bucket="${GCP_BUCKET_NAME}" \
location="${GCP_REGION}"
2:06AM INF connection `MY_BIGQUERY` has been set in /home/runner/.config/.sling/env.yaml. Please test with `sling conns test MY_BIGQUERY`
$ sling conns test MY_BIGQUERY
~ Failed to get datasets in project: pola-bi-looker
Get "https://bigquery.googleapis.com/bigquery/v2/projects/pola-bi-looker/datasets?all=false&alt=json&pageToken=&prettyPrint=false": oauth2/google/externalaccount: invalid response when retrieving subject token: Get "https://pipelinesghubeus2.actions.githubusercontent.com/XXXXXXXX/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/XXXXXXX/idtoken?api-version=2.0&audience=https%3A%2F%2Fiam.googleapis.com%2Fprojects%2F354540873199%2Flocations%2Fglobal%2FworkloadIdentityPools%2Fgithub%2Fproviders%2Fpola-backend-repo": context canceled
Hi, thanks for the details. I’m not too familiar with OpenID. Looking here: https://cloud.google.com/docs/authentication/get-id-token#go Seems like you have to provide a URL to get the token. Is that correct? Did you have to provide a url in python?
Here is my PR; https://github.com/KlubJagiellonski/pola-backend/pull/3799/files Here is my workflow file: https://github.com/KlubJagiellonski/pola-backend/blob/e5ca38e042d4f67a2b74ab879283b7407266fd6b/.github/workflows/bi-transfer_pola_backend_to_bq.yml
To authenticate using Open ID Connect, I used the google-github-actions/auth action as below:
# Use Open ID Connect to authenticate to GCP
# For details, see:
# https://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions
- id: 'auth'
name: 'Authenticate to GCP'
uses: 'google-github-actions/auth@v2'
with:
project_id: '${{ env.GCP_PROJECT_ID }}'
# yamllint disable-line rule:line-length
workload_identity_provider: 'projects/${{ env.GCP_PROJECT_NUMBER }}/locations/global/workloadIdentityPools/${{ env.GCP_IDENTITY_POOL }}/providers/${{ env.GCP_IDENTITY_PROVIDER }}'```
Open ID Connect authentication on GCP is provided by Workload Identity Federation feature. in my case, I use (Preferred) Direct Workload Identity Federation. Here is a detailed setup guide: https://github.com/google-github-actions/auth?tab=readme-ov-file#preferred-direct-workload-identity-federation
Here are docs that describe Open ID Connect on GitHub Action: Security hardening your deployments - Use OpenID Connect within your workflows to authenticate with your cloud provider.
OK, created this PR: https://github.com/slingdata-io/sling-cli/pull/290
But I am unable to test, so I'll need your help for that.
The connection key audience is now accepted to create the TokenSource, which is then used as the AuthOption for creating the BQ client.
See Audience use here: https://pkg.go.dev/google.golang.org/api/idtoken#NewTokenSource I'm not sure what that audience URL should be, since it was not part of your python solution. Hopefully you can solve that part.
Closing, feel free to open again.
Unfortunately, it doesn't work stil..
I updated my PR to use our branch: https://github.com/KlubJagiellonski/pola-backend/pull/3799 I first run my Python script to confirm that all permissions work, and then I run your app compiled from the sources.
If you want, I can create a fork of your repository and configure identical access, and then give you access to the repository so that you can test freely. What do you think?
I can create a fork of your repository and configure identical access, and then give you access to the repository so that you can test freely.
Yes, That would be great. Please fork from https://github.com/slingdata-io/sling-cli/pull/290