soda-core icon indicating copy to clipboard operation
soda-core copied to clipboard

Docker container local configuration

Open tombaeyens opened this issue 2 years ago • 2 comments

Please review these assumptions first:

You have build a docker container to run a scan. In production, the docker container uses a BigQuery service account to run the scan. Developers want to run that docker container locally to test scans. When developers run it locally they want the ability to use a different account, a non-service account.

With those assumptions I would propose the following approach:

The scan uses 1 or more configuration files as input, next to the SodaCL check files. The configuration files contain the connection details including GCP account credentials. So the goal here is to use different configuration files in production as locally on the developers laptops. The scan configuration files can be referenced in the command line with the -c option eg:

soda scan -d bq -c configuration-bg-prod.yml checks.yml

and

soda scan -d bq -c configuration-bg-dev.yml checks.yml

See also https://docs.soda.io/soda-core/scan-core.html#anatomy-of-a-scan-command

Does this help to find a solution for

  • Configuring the local developer account credentials
  • Avoiding the use of the service account when it's not wanted ?

tombaeyens avatar Sep 02 '22 13:09 tombaeyens

SODA-1129

jmarien avatar Sep 02 '22 13:09 jmarien

Hi 👋🏼

Not sure if it covers your DTAP-needs, but I currently use docker run -e KEY=VALUE to pass on variables from host to container, where VALUE depends on which DTAP-environment it should run in. The provided configuration.yml is configured with variables - as it can now call the provided VALUE's as available system environment variables. Resulting in one image/container suitable for multiple environments, configurable at run-time. I use it to run Soda's docker image in 4 different environments with Azure Pipelines or whatever CI/CD tool applicable. But it should be perfectly suitable for ad hoc use from CLI.

configuration.yml:

data_source your_data_source:
  type: sqlserver
  connection:
    host: ${SQL_SERVER}
    username: ${SQL_USERNAME}
    password: ${SQL_PASSWORD}
  database: ${SQL_DATABASE}
  schema: ${SQL_SCHEMA}
  trusted_connection: false
  encrypt: true
  trust_server_certificate: false
soda_cloud:
  host: cloud.soda.io
  api_key_id: ${SODA_API_ID}
  api_key_secret: ${SODA_API_SECRET}

Command to provide values for the variables being called from configuration.yml (using PowerShell Core-syntax in Azure Pipelines, but you get the point):

docker run `
    --rm `
    -v /path/to/your_soda_directory:/sodacl `
    -e SQL_SERVER=127.0.0.1 `
    -e SQL_USERNAME=sa `
    -e SQL_PASSWORD=****** `
    -e SQL_DATABASE=master `
    -e SQL_SCHEMA=dbo `
    -e SODA_API_ID=ab12345a-1a12-123a-12ab-a12aa1ab1234 `
    -e SODA_API_SECRET=****** `
    sodadata/soda-core:v3.0.10 scan -d your_data_source -c /sodacl/configuration.yml /sodacl/checks.yml

Hope this helps. Cheers!

geertvanzoest avatar Oct 05 '22 10:10 geertvanzoest