cli icon indicating copy to clipboard operation
cli copied to clipboard

Service Principal for bundle validate src error

Open bicaluv opened this issue 2 months ago • 6 comments

Describe the issue

I want to use a Service Principal I configured for me workspace to do the actual validation and deployment from a serverless web terminal.

Configuration

Please provide a minimal reproducible configuration for the issue

Steps to reproduce the behavior

Please list the steps required to reproduce the issue, for example:

  1. Create a simple bundle containing only a job (under resources) with one task which calls a hello world notebook (under src)
  2. Open web terminal and switch to existing bundle folder of current logged in user
  3. Create the ~/.databrickscfg file with profile dab_sp to store the azure service principal creds.
  4. Set export DATABRICKS_TOKEN= to force usage of creds from cfg file
  5. Run databricks bundle validate -t dev -p dab_sp
  6. See error

Expected Behavior

I would expect that the call would use the given service principal to do the validation and deployment

Actual Behavior

Getting this error message:

Error: notebook src/hello_notebook.ipynb not found

Name: dab_test
Target: dev
Workspace:
  Host: https://adb-xxxxx.1.azuredatabricks.net
  User: xxxx-xxxx-xxx-xxxx-xxxxx
  Path: /Workspace/Users/xxxx/.bundle/gen_dab_test/dev

OS and CLI version

Databricks CLI v0.270.0

Is this a regression?

Did this work in a previous version of the CLI? If so, which versions did you try?

Debug Logs

see attachment

dbg.zip

bicaluv avatar Sep 29 '25 13:09 bicaluv

Thanks for reporting the issue.

Can you confirm whether the notebook's export format is .ipynb or .py?

If you refer to src/hello_notebook.ipynb then the export format needs to be .ipynb, even if it doesn't show the file extension in the workspace UI.

pietern avatar Sep 29 '25 14:09 pietern

Format is .ipynb which is not displayed in the UI but can be checked in the web terminal.

BTW: using my current user and not to use the service principle results in successful validation.

bicaluv avatar Sep 29 '25 14:09 bicaluv

I suspect that the problem might be because notebook files are present in the Workspace folder for the user with whom you log in to DAB in UI and not SP, because the bundle was never deployed for this SP

@fjakobs @ilyakuz-db do you know more about it?

andrewnester avatar Oct 01 '25 11:10 andrewnester

@andrewnester : this is why I set the workspace rootfolder in my databricks.yml to the current user folder. Btw: and why does it find the resources folder and read the job.yml file?

bicaluv avatar Oct 01 '25 11:10 bicaluv

@bicaluv can you share your databricks.yml configuration here? Likely you have an include section which picks up other configuration files.

As to the original issue, does the SP you use have permissions to access your current user folder?

andrewnester avatar Oct 01 '25 11:10 andrewnester

As to the original issue, does the SP you use have permissions to access your current user folder?

Where and how can I check this? using ls -l does not show any specific limits.

databricks.yml:

# This is a Databricks asset bundle definition for datamind-allspark.
# The Databricks extension requires databricks.yml configuration file.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.

bundle:
  name: gen_dab_test

include:
  - resources/*.yml
 
workspace:
  root_path: /Workspace/Users/[email protected]/.bundle/${bundle.name}/${bundle.target}

targets:
  dev:
    default: true
    workspace:
      host: https://adb-123456.azuredatabricks.net
    run_as:
      service_principal_name: cbdb0852-xxx-1234-8616-xxxxx
    variables:
      cluster_policy_id:
        default: 30648123456
      pause_status:
        default: PAUSED
      warehouse_id:
        default: 76e354d1234567
    presets:
      tags:
        use-case: bicaluv_tests

    permissions:
      - level: CAN_MANAGE
        group_name: Data-Engineers
      - level: CAN_MANAGE
        service_principal_name: cbdb0852-xxx-43e2-xxx-efb05512345
      - level: CAN_MANAGE
        user_name: [email protected]


variables:
  cluster_policy_id:
    default: 306483B21234567
    description: The cluster Policy Id, depending on the stage we are working in.
  warehouse_id:
    default: 0
    description: The SQL Warehouse ID to use for interacting with materialzed views.
  pause_status:
    default: UNPAUSED
    description: Should the Job be paused or unpaused?

And the job yml referenced using resources:

resources:
  jobs:
    test_job:
      name: test_job
      tasks:
        - task_key: notebook_runner
          email_notifications:
            on_failure:
              - [email protected]
          notebook_task:
            notebook_path: ../src/hello_notebook.ipynb
            source: WORKSPACE
          notification_settings: {}
          run_if: ALL_SUCCESS
          webhook_notifications: {}
      email_notifications: {}
      max_concurrent_runs: 1
      performance_target: PERFORMANCE_OPTIMIZED
      queue:
        enabled: true
      tags:
        "contact": "[email protected]"
      webhook_notifications: {}

bicaluv avatar Oct 01 '25 13:10 bicaluv

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

github-actions[bot] avatar Dec 09 '25 00:12 github-actions[bot]