dbx ## How to parameterized DBX Python Notebook

trafficstars

The overall goal is to make database name (prod/dev/test) dynamic for each notebook in dbx job and passing that database name directly from jenkins without modifying notebook file or deployment.yaml file for each environment . If I am creating a dbx job where I have few databricks notebook and I want to pass the database name dynamically into each python notebook without using databricks widget (assuming I am using sys.args that will read the input of dbx clie parameter and I want to run my job something like :- dbx launch --job "my_job_name" --parameter='{"db_name": "my_db_name"}' and it will send that info to my job and all associated notebook which will read these info from conf/deployment.yaml and in deployment.yaml file I will have something like :-- notebook_task: notebook_path:"/Reposs/My_github_repo/blala/notebookname" base_parameters: db_name"{{env.db_name_from_env}}"

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

dbx version used:0.7.4
databricks-cli:0.17.3
spark_version:12.2.x-scala2.12
Databricks Runtime version: 12.2 LTS or above

Aug 31 '23 18:08 ssr8998

Edit: I did not realise you specified a notebook task, updated with original comment left underneath Edit 2: Updated CLI snippets to have same environment as yml example

To pass a value from a local environment variable to a workflow definition in a notebook you should instead define the environment variable in the cluster configuration and read them into the notebook e.g., database_name = os.environ.get('DATABASE_NAME'). This can be done in deployment.yml.

  basic-cluster: &basic-cluster
    new_cluster:
      spark_version: "10.4.x-cpu-ml-scala2.12"
      spark_conf:
        <<: *basic-spark-conf
        spark.databricks.passthrough.enabled: false
      spark_env_vars:
        DATABASE_NAME: "{{ env['DATABASE_NAME'] }}"

deployment.yml reference

See original comment below for how to use jinja with the deployment file.

Original comment

It is probably better practice to deploy separate workflows for separate environments, but to answer your question you can use the jinja support functionality (Jinja Support) combined with environment variables.

Also see Passing Parameters

Your deployment file should look something like this: conf/deployment.yml.j2

build:
  python: "pip"

environments:
  default:
    workflows:
      - name: "my-workflow"
        tasks:
          - task_key: "task1"
            python_wheel_task:
              package_name: "some-pkg"
              entry_point: "some-ep"
              parameters: ["database_name", "{{ env['DATABASE_NAME'] }}"]

Deploy via CLI

export DATABASE_NAME=dev
dbx deploy --environment default --deployment-file conf/deployment.yml.j2 "my-workflow"

Launch via CLI

dbx launch --environment default --parameters='{"python_params":["database_name","${DATABASE_NAME}"]}' "my-workflow"

Note that you will need to append the .j2 extension to your yaml file, or alternately enable in place jinja support in your project configuration.

Sep 04 '23 12:09 doug-cresswell

I tried to follow your steps :- Here is how my deployment.yaml.j2 look like : {% set db_name =env['db_name'] | default('name_of_my_db') %} ......basic config etc.etc...

spark_python_task: python_file: file://my_path_/name_of_python_notebook_converted_to_job.py" parameters: ["db_name","{{env['db_name']}}"] ............

Now I am trying to access this database name into my name_of_python_notebook_converted_to_job.py by calling :- db_name =json.loads(sys.argv[1]).get('python_params',[])[1]

I am calling the dbx cli like:-dbx deploy --deployment-file conf/deployment.yaml.j2 "name_of_my_work_flow" and then to launch job:-dbx launch --parameters ='{"python_params":["db_name","${db_name}"]}' "name_of_my_work_flow"

look like my job can't read from sys.argv . I am getting error :-JSONDecoderError: Expecting value: line 1 column 1 (char 0)

----> db_name =json.loads(sys.argv[1]).get('python_params',[])[1]

Sep 06 '23 14:09 ssr8998

if I use export DATABASE_NAME=dev dbx deploy -e dev --deployment-file conf/deployment.yml.j2 "my-workflow", it complains that "environment dev not found in the project file .dbx/project.json . In my project json I've environment -->default->profile, storage_type, properties -->workspce_directory, artifact_location

Sep 06 '23 14:09 ssr8998

JSONDecoderError

Notebooks use widgets to pass parameters, so you cannot pass parameters to a notebook task like you would for an entrypoint in a python wheel. You either need to use widgets, or define environment variables on the cluster using spark_env_vars. This way the environment variables will be available to the notebook through os.environ.

Environment Not Found Error

For the error environment dev not found in the project file .dbx/project.json the environments defined in your deployment yaml must match those in your project.json file.

environments:
  default:

You can use the dbx configure command to set up new environments in your project if you should need multiple. If not simply remove the -e / --environment from your cli commands and it will use the "default" instead. dbx configure docs project.json docs

Sep 07 '23 08:09 doug-cresswell

Thanks for your reply , Well, I converted the notebook to a pure python file , no #magic and no #widget and no dbutils can be and should be used as we need to run unittest to test locally. Hence, I was expecting this plain python file will be able to take argument value from this cli . Look like it can't parse " dbx launch --job "my_job_name" --parameter='{"db_name": "my_db_name"}' . My question is : why the parameter's first field(key) "db_name" is not parsing into my sys.arg? db_name =json.loads(sys.argv[1]).get('python_params',[])[1]

On Thu, Sep 7, 2023 at 4:41 AM Doug Cresswell @.***> wrote:

JSONDecoderError

Notebooks use widgets to pass parameters, so you cannot pass parameters to a notebook task like you would for an entrypoint in a python wheel. You either need to use widgets, or define environment variables on the cluster using spark_env_vars. This way the environment variables will be available to the notebook through os.environ. Environment Not Found Error

For the error environment dev not found in the project file .dbx/project.json the environments defined in your deployment yaml must match those in your project.json file.

environments: default:

You can use the dbx configure command to set up new environments in your project if you should need multiple. If not simply remove the -e / --environment from your cli commands and it will use the "default" instead. dbx configure docs https://urldefense.proofpoint.com/v2/url?u=https-3A__dbx.readthedocs.io_en_latest_reference_cli_-23dbx-2Dconfigure&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=XbyuY5JATO0jqJ0BtjDm_g&m=sxAtfWaj6ob_udX_IztzjOCcxXrYh8q6MAXhZUYse3UhQdnfEYaZWzfml3Oe3IIi&s=Vui1bGZB4c7a6EnIWRw4XnHPWVrdbsJxIN5gGruAN3E&e= project.json docs https://urldefense.proofpoint.com/v2/url?u=https-3A__dbx.readthedocs.io_en_latest_reference_project_-3Fh-3Denvironment&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=XbyuY5JATO0jqJ0BtjDm_g&m=sxAtfWaj6ob_udX_IztzjOCcxXrYh8q6MAXhZUYse3UhQdnfEYaZWzfml3Oe3IIi&s=v3B49HsjLzUv10ZEVhz8s9C7kRkAC1dIySo4sHhFOys&e=

FYI for next time, this kind of question is probably more appropriate for Stack Overflow than a GitHub issue.

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databrickslabs_dbx_issues_841-23issuecomment-2D1709720157&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=XbyuY5JATO0jqJ0BtjDm_g&m=sxAtfWaj6ob_udX_IztzjOCcxXrYh8q6MAXhZUYse3UhQdnfEYaZWzfml3Oe3IIi&s=IWAwkG5ZclFgr5FQ-9cry-90jOACEP2iD7WJE7j2mDs&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AUHPD3UO76E7V67WV7KT72TXZGCDDANCNFSM6AAAAAA4GPDMQM&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=XbyuY5JATO0jqJ0BtjDm_g&m=sxAtfWaj6ob_udX_IztzjOCcxXrYh8q6MAXhZUYse3UhQdnfEYaZWzfml3Oe3IIi&s=X4bcH1iVy4ZO52H9EdQFc010iZsU2qAm61lhCkW8Iyw&e= . You are receiving this because you authored the thread.Message ID: @.***>

Sep 07 '23 18:09 ssr8998

dbx dbx copied to clipboard

## How to parameterized DBX Python Notebook

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

JSONDecoderError

Environment Not Found Error

dbx
dbx copied to clipboard