cli
cli copied to clipboard
`mode: production` target appears bugged during GitHub action deployment for v0.222 (`poetry` build)
Describe the issue
Deploying via DABs through GitHub actions fails within a production target for CLI v0.222.0.
Configuration
Workflow file:
# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: "Prod deployment"
# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
push:
branches:
- main
jobs:
deploy:
name: "Deploy bundle"
runs-on: ubuntu-latest
steps:
# Check out this repo, so that this workflow can access it.
- uses: actions/checkout@v4
# Download the Databricks CLI.
# See https://github.com/databricks/setup-cli
- uses: databricks/setup-cli@main
- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install poetry
poetry install --all-extras
# Deploy the bundle to the "prod" target as defined
# in the bundle's settings file.
- run: databricks bundle deploy --debug
working-directory: .
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
DATABRICKS_BUNDLE_ENV: prod
Steps to reproduce the behavior
- Run
databricks bundle deploy -t <prod-target>via a GitHub action using the above workflow file.
Expected Behavior
Deployment should execute properly for any mode.
Actual Behavior
Deployment fails for mode: production and works for mode: development.
OS and CLI version
CLI: v0.222.0
OS: Ubuntu 22.04.4
Running on Windows 11 locally works fine for CLI v0.222.0.
Is this a regression?
Yes, using databricks/[email protected] fixes the issue.
Debug Logs
Run databricks bundle deploy --debug
22:25:48 INFO start pid=1662 version=0.222.0 args="databricks, bundle, deploy, --debug"
22:25:48 DEBUG Found bundle root at /home/runner/work/tjc-databricks/tjc-databricks (file /home/runner/work/tjc-databricks/tjc-databricks/databricks.yml) pid=1662
22:25:48 DEBUG Apply pid=1662 mutator=load
22:25:48 INFO Phase: load pid=[16](https://github.com/TJC-LP/tjc-databricks/actions/runs/9719193100/job/26828536724#step:6:17)62 mutator=load
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=EntryPoint
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=scripts.preinit
22:25:48 DEBUG No script defined for preinit, skipping pid=1662 mutator=load mutator=seq mutator=scripts.preinit
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq
[22](https://github.com/TJC-LP/tjc-databricks/actions/runs/9719193100/job/26828536724#step:6:23):25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/Excel_files/dabs_xlsx_job.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/PDF_files/pdf_processing_dabs_job.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/amazon_reviews/resources.yml)
22:[25](https://github.com/TJC-LP/tjc-databricks/actions/runs/9719193100/job/26828536724#step:6:26):48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/capiq/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/capital_markets/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/data_rooms/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/deal_database/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/firehose/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/investor_relations/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/isg/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/omg/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq mutator=ProcessInclude(workflows/tax/resources.yml)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=VerifyCliVersion
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=EnvironmentsToTargets
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=InitializeVariables
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=DefineDefaultTarget(default)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=LoadGitDetails
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=PythonMutator(load)
22:25:48 DEBUG Apply pid=1662 mutator=load mutator=seq mutator=SelectTarget(prod)
22:25:48 ERROR Error: cannot merge int with string pid=1662 mutator=load mutator=seq mutator=SelectTarget(prod)
22:25:48 ERROR Error: cannot merge int with string pid=1662 mutator=load mutator=seq
22:25:48 ERROR Error: cannot merge int with string pid=1662 mutator=load
Error: cannot merge int with string
22:25:48 ERROR failed execution pid=1662 exit_code=1 error="cannot merge int with string"
Error: Process completed with exit code 1.
Note that destroying the target and/or manually deleting the .bundle and retrying yields the same issue.
Thanks for reporting this issue. Can you share (a snippet of) your bundle configuration?
We didn't change the merge logic that affects these code paths so I expect an issue upstream.
Being able to reproduce this would be very helpful.
Sure, please see our databricks.yml below. Our sub ymls are purely representative of workflows.
# This is a Databricks asset bundle definition for tjc_databricks.
# See [REDACTED] for documentation.
bundle:
name: tjc-databricks
git:
origin_url: [REDACTED]
# branch: main
artifacts:
default:
type: whl
build: poetry build
path: .
include:
- workflows/*/*.yml
variables:
environment:
description: The environment of the workflow
default: dev
principal_user:
description: The principal user to run in production
default: [REDACTED]
tjc_excelsior_version:
description: The version of `tjc-excelsior` to use
default: 1.0.8
tika_ocr_version:
description: The version of `tika-ocr` to use
default: 0.1.6
pause_status:
description: The status of scheduling for jobs. Only unpauses for prod.
default: PAUSED
pause_status_file_sync:
description: The status of allowing file notifications. Only pauses for dev.
default: UNPAUSED
limit:
description: The limit to use for testing
default: 10
targets:
dev:
mode: development
default: true
workspace:
host: [REDACTED]
variables:
environment: dev
pause_status_file_sync: PAUSED
test:
mode: development
workspace:
host: [REDACTED]
root_path: /Users/${var.principal_user}/.bundle/${bundle.name}/${bundle.target}
run_as:
user_name: ${var.principal_user}
variables:
environment: test
prod:
mode: production
workspace:
host: [REDACTED]
root_path: /Users/${var.principal_user}/.bundle/${bundle.name}/${bundle.target}
run_as:
user_name: ${var.principal_user}
variables:
environment: prod
pause_status: UNPAUSED
limit: "-1"
Thanks for providing the config. I'm able to reproduce.
The underlying problem is that we changed how we store variable values. All values used to be cast into a string, so you could use YAML strings, integers, and bools interchangeably and it would work. We changed this to accommodate complex-valued variables and now they can assume any type. Mixing types at the YAML level is what's causing the issue here.
We'll investigate further and figure out how to support this better.
In the meantime, you can work around the issue by making all variable values explicit strings:
variables:
# ...
limit:
description: The limit to use for testing
default: "10"
Note the quotes around the value 10.
@pietern also catching this error with setting up job timeouts via variable.
I've tried to put 7200 into the quotes but still getting an error "cannot merge int with string", any workarounds?
Thanks!
targets:
qa:
mode: production
workspace:
host: http....
root_path: /Workspace/TEST/.bundle/${bundle.name}/${bundle.target}
variables:
timeout_seconds: 7200
warning_seconds: 5400
run_as:
service_principal_name: ${var.spn}
resources:
jobs:
example_ingest:
timeout_seconds: ${var.timeout_seconds}
health:
rules:
- metric: RUN_DURATION_SECONDS
op: GREATER_THAN
value: ${var.warning_seconds}
@pietern also catching this error with setting up job timeouts via variable.
I've tried to put 7200 into the quotes but still getting an error "cannot merge int with string", any workarounds?
Thanks!
targets: qa: mode: production workspace: host: http.... root_path: /Workspace/TEST/.bundle/${bundle.name}/${bundle.target} variables: timeout_seconds: 7200 warning_seconds: 5400 run_as: service_principal_name: ${var.spn} resources: jobs: example_ingest: timeout_seconds: ${var.timeout_seconds} health: rules: - metric: RUN_DURATION_SECONDS op: GREATER_THAN value: ${var.warning_seconds}
ok, workaround for my case is just to remove timeout_seconds from job template and push it only from the main bundle deployment file, in this case it works even without a quotes.
@blood-onix This sounds like a different issue.
Did you hard-code timeout_seconds: <some integer> in your base definition?
@blood-onix This sounds like a different issue.
Did you hard-code
timeout_seconds: <some integer>in your base definition?
If you mean job template - yes, so the job was created manually via UI and when exported via databrircks bundle generate so timeout_seconds set in a job template. Idea is to overwrite value for a different targets while for dev it will use default value from a job template.
../resources/example_ingest.yml
resources:
jobs:
example_ingest:
name: 'Example test ingest'
email_notifications:
on_duration_warning_threshold_exceeded:
- [email protected]
no_alert_for_skipped_runs: false
webhook_notifications: {}
timeout_seconds: 3200
max_concurrent_runs: 1
tasks:
- task_key: Ingest
Thanks for providing the config. I'm able to reproduce.
The underlying problem is that we changed how we store variable values. All values used to be cast into a string, so you could use YAML strings, integers, and bools interchangeably and it would work. We changed this to accommodate complex-valued variables and now they can assume any type. Mixing types at the YAML level is what's causing the issue here.
We'll investigate further and figure out how to support this better.
In the meantime, you can work around the issue by making all variable values explicit strings:
variables: # ... limit: description: The limit to use for testing default: "10"Note the quotes around the value
10.
This is helpful, thank you. Out of curiosity, is there a known reason why this only appears to be affecting us in the prod target and only via GitHub actions? Our dev and test CI/CD works, and locally on Windows 11 I can successfully run databricks bundle deploy -t prod.
@arcaputo3 If the configuration you provided is complete, then it is because only the prod target overrides the variable value (with an incompatible type). For the other targets, it can use the default provided at the top level directly.
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.