cli icon indicating copy to clipboard operation
cli copied to clipboard

Monitor deployment fails after being deployed once through the bundle

Open star-yar opened this issue 8 months ago • 8 comments
trafficstars

Describe the issue

I specify the quality monitor for a table. It gets deployed through CI using the CLI call (databricks budle deploy -t our_target) under a service account. After the deployment is done, all other deploument attempts would fail with: Error: cannot create quality monitor: Data Monitor 'catalog.schema.table' already exists

Configuration

Steps to reproduce the behavior

Please list the steps required to reproduce the issue, for example:

  • Create a table catalog.schema.table in UC
  • Create a quality monitor for it in the bundle
  • Deploy once databricks bundle deploy (should succeed)
  • Deploy again databricks bundle deploy (should succeed but fails)

Expected Behavior

Monitor gets updated after first deployment (PUT-call)

Actual Behavior

Monitor gets updated after first deployment (POST-call)

OS and CLI version

OS: ubuntu 24.04.2 cli version: 0.224.1

star-yar avatar Mar 05 '25 19:03 star-yar

First, could you try to upgrade to the latest CLI version and try it again.

Secondly, when you deploy from CI, do you deploy to the same workspace.root_path or the different ones for each deployments?

andrewnester avatar Mar 06 '25 10:03 andrewnester

First, could you try to upgrade to the latest CLI version and try it again.

Updated to 0.243.0 – still fails

do you deploy to the same workspace.root_path or the different ones for each deployments?

same one

star-yar avatar Mar 06 '25 14:03 star-yar

Seeing the same as OP on Ubuntu: 24.04 / Databricks CLI: 0.243.0

gkinnell avatar Mar 11 '25 15:03 gkinnell

Facing same issue but only in local dev mode. CI/CD works as intended deploying with Service Principle. For local development, after initial bundle deployment, tested setting quality monitor asset dir for both User and Shared in the workspace in mode: development but it fails as monitor already exists:

Updating deployment state...
Error: terraform apply: exit status 1

Error: failed to create monitor

  with databricks_quality_monitor.gcaa-commercial-attribution_quality_monitor,
  on bundle.tf.json line 450, in resource.databricks_quality_monitor.gcaa-commercial-attribution_quality_monitor:
 450:       }

Already exists Monitor with ID:

andredmoliveira avatar Mar 11 '25 16:03 andredmoliveira

@gkinnell @andredmoliveira thanks for chiming in. Could you please share your bundle YAML configuration so we can try to reproduce the issue on our side? Thank you!

andrewnester avatar Mar 12 '25 09:03 andrewnester

Hi @andrewnester! Our bundle config file is very much in line with the mlops-stacks.

Default path for quality monitor asset dir is the data product folder in the workspace, and we've tried to add user name folder when deploying in dev workspace:


bundle:
  uuid: ...
  name: gcaa-commercial-attribution

workspace:
  root_path: /commercial_attribution/.bundle/${bundle.name}/${bundle.target}

variables:
  experiment_name:
    description: Experiment name for the model training.
    default: /commercial_attribution/.bundle/${bundle.name}/${bundle.target}-gcaa-commercial-attribution-experiment
  model_name:
    description: Model name for the model training.
    default: ${bundle.target}-gcaa-commercial-attribution-model
  catalog_name:
    description: The catalog name to save the trained model
  policy_id:
    description: Policy ID for cluster
  schema_name:
    description: The schema name linked to the Data Consumer Product
    default: commercial_attribution_gold
  quality_monitor:
    description: Quality Monitor Asset Directory
    default: /commercial_attribution/databricks_lakehouse_monitoring

include:
  - ./resources/batch-inference-workflow-resource.yml
  - ./resources/ml-artifacts-resource.yml
  - ./resources/model-workflow-resource.yml
  - ./resources/feature-engineering-workflow-resource.yml
  - ./resources/monitoring-resource.yml

targets:
  dev:
    mode: development
    default: true
    variables:
      catalog_name: dev_commercial
      policy_id: ...
      experiment_name: /commercial_attribution/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}-gcaa-commercial-attribution-experiment
      # quality_monitor: /commercial_attribution/${workspace.current_user.userName}/databricks_lakehouse_monitoring
    workspace:
      host: ...
      root_path: /commercial_attribution/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}

  test:
    variables:
      catalog_name: qa_commercial
      policy_id: ...
    workspace:
      host: ...

  staging:
    variables:
      catalog_name: qa_commercial
      policy_id: ...
    workspace:
      host: ...

  prod:
    variables:
      catalog_name: commercial
      policy_id: ...
    workspace:
      host: ...

andredmoliveira avatar Mar 12 '25 09:03 andredmoliveira

Facing same issue but only in local dev mode. CI/CD works as intended deploying with Service Principle.

Do you use the same CLI version on CI/CD and locally?

What if you prefix quality_monitor with /Workspace like quality_monitor: /Workspace/commercial_attribution/...?

andrewnester avatar Mar 12 '25 10:03 andrewnester

Do you use the same CLI version on CI/CD and locally?

Same version 0.243.0

What if you prefix quality_monitor with /Workspace like quality_monitor: /Workspace/commercial_attribution/...?

Same error.

Judging by the behaviour of experiments and models when doing parallel local development, seems like not being able to prefix the quality monitor name or have different ID's in development mode, does not allow overwriting.

andredmoliveira avatar Mar 12 '25 12:03 andredmoliveira

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

github-actions[bot] avatar May 31 '25 00:05 github-actions[bot]