dbx Append Behaviour for cluster spark

trafficstars

Hello guys,

First of all thanks a lot for all your work on this project. It helps me a lot for automatization and CI/CD.

However, currently, I've trouble working with policies already defined in my Databricks. My goal is to use an already existing policy (no problem in that). And add some configuration in spark_conf that the policy doesn't have (for example if I want a Single Node). A solution would be to have another policy for single nodes only.

I saw the issue: https://github.com/databrickslabs/dbx/pull/532 who allowed init script appending, which is kinda similar.

Am I thinking the wrong way ?

Thanks for your answers

Expected Behavior

I would expect a merge of both configurations.

Current Behavior

Today I have invalid parameter values if I had spark_conf in the deployment.yml using a policy.

Your Environment

dbx version used: 0.8.7
Databricks Runtime version: 10.4.x-scala2.12

Nov 21 '22 10:11 RaccoonForever

It might be a misunderstanding from me that when specifying a policy name or id it is not inheriting from the policy the configuration. We have to make it compliant with the policy, it is not inheriting from it.

Nov 21 '22 12:11 RaccoonForever

I made it work using the following parameter: apply_policy_default_values: true

Like this:

sp-basic-cluster-props: &sp-basic-cluster-props
    policy_id: "cluster-policy://XXXX"
    spark_version: "10.4.x-scala2.12"
    num_workers: 0
    node_type_id: "m5a.large"
    aws_attributes:
      zone_id: "auto"
      availability: "ON_DEMAND"
    spark_conf:
      spark.master: "local[*, 4]"
      spark.databricks.cluster.profile: "singleNode"
    enable_elastic_disk: true
    apply_policy_default_values: true

Maybe we could add an example using this in the tests ?

Nov 21 '22 13:11 RaccoonForever

hi @RaccoonForever , this parameter is something new to me. If it works, that's great, then most probably we don't even need the local policy preprocessing even more.

Nov 21 '22 18:11 renardeinside

cc @copdips could you please check if disabling policy preprocessor + enabling the property apply_policy_default_value=true will give you the same effect with init_scripts?

Nov 21 '22 18:11 renardeinside

hello @renardeinside ,

disabling policy preprocessor, you mean disabling the _deep_update() ?

From my understanding, _deep_update() is a sort of pre-validation at dbx level. If we disable it, it will be directly at databricks-cli level (precisely databricks jobs api level) to proceed the validation when the policy_id is given in the jobs definition.

A slight difference on init_scripts might be the script order, suppose such use case: The cluster policy specifies "init_scripts.0.dbfs.destination": script_1, and the init_scripts in the deployement file specifies init_scripts: [script_2, script_1]. 1 is behind 2.

dbx with _deep_update() will dedup the init_scripts, and generate: init_scripts: [script_1, script_2]. 2 is behind 1 now.
but when dbx without this dedup, dbx will keep init_scripts: [script_2, script_1], and the jobs apis will return an error saying that script_1 must be the first script in the list becasue of init_scripts.0.

Regarding apply_policy_default_value=true or =false, from my tests, it seems that this param has no effet, it's not provided in the jobs api doc neither, I think this param is silently discarded by the jobs api.

Regarding the original request about spark_conf appending, it works from my side without apply_policy_default_value. @RaccoonForever, maybe my tests haven't covered your use cases, could you please share the error message ?

Nov 22 '22 10:11 copdips

I'll try to write the specific failing execution next week ! My trouble was specifically about spark_conf as you said @copdips and not init_scripts :)

I'll do my best to give it to you as fast as I can !

Dec 02 '22 14:12 RaccoonForever

dbx
dbx copied to clipboard

Append Behaviour for cluster spark_conf

Expected Behavior

Current Behavior

Your Environment

dbx dbx copied to clipboard

Append Behaviour for cluster spark_conf

Expected Behavior

Current Behavior

Your Environment

dbx
dbx copied to clipboard