cli icon indicating copy to clipboard operation
cli copied to clipboard

[RFC] First-class catalog and schema setting

Open lennartkats-db opened this issue 1 year ago • 3 comments

Changes

This is the latest, experimental way to add first-class 'catalog' and 'schema' notion.

The basic idea is that databricks.yml can say

targets:
  dev:
    ...
    presets:
      catalog: dev
      schema: ${workspace.current_user.short_name} # the user's name, e.g. lennart_kats

  prod:
    ...
    presets:
      catalog: prod
      schema: finance

which will then configure the default schema for all resources in the bundle (pipelines, jobs, model serving endpoints, etc.)

A caveat exists for notebooks, which need use parameters to configure the catalog and schema. While the catalog and schema parameter values are automatically passed to all job tasks, notebooks need to consume the parameter values. We check whether they do this, and otherwise show a recommendation:

Recommendation: Use the 'catalog' and 'schema' parameters provided via 'presets.catalog' and 'presets.schema' using

  dbutils.widgets.text('catalog')
  dbutils.widgets.text('schema')
  catalog = dbutils.widgets.get('catalog')
  schema = dbutils.widgets.get('schema')
  spark.sql(f'USE {catalog}.{schema}')

  in src/notebook.ipynb:1:1

Note that the code above also helps for interactive notebook development scenarios: users can use the parameter widgets to set the catalog and schema they use during development.

Similarly, for Python and Wheel tasks, users must add some extra code to process a catalog and schema parameter. For Python tasks we show a similar recommendation; for wheel tasks we can't directly check for this.

Tests

Tests based on reflection, making sure we have coverage for all current/future resources:

  • Each resource property that has a name like catalog, schema, parameters, etc. must have one test case or ignore rule
  • Each catalog/schema related property is tested, making sure they change to the default catalog/schema based on presets
  • Most tests are executed based on a declarative specification of the expected target state.

lennartkats-db avatar Dec 09 '24 08:12 lennartkats-db

Test Details: go/deco-tests/12253819228

eng-dev-ecosystem-bot avatar Dec 10 '24 10:12 eng-dev-ecosystem-bot

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger: go/deco-tests-run/cli

Inputs:

  • PR number: 1979
  • Commit SHA: 204d3b08d13b90b71f3236313412f72b5333908e

Checks will be approved automatically on success.

github-actions[bot] avatar Dec 21 '24 08:12 github-actions[bot]

This PR has not received an update in a while. If you want to keep this PR open, please leave a comment below or push a new commit and auto-close will be canceled.

github-actions[bot] avatar Mar 12 '25 00:03 github-actions[bot]

This PR has not received an update in a while. If you want to keep this PR open, please leave a comment below or push a new commit and auto-close will be canceled.

github-actions[bot] avatar May 28 '25 00:05 github-actions[bot]