cli icon indicating copy to clipboard operation
cli copied to clipboard

[ISSUE] `cluster_mount_info` is not a valid field

Open stevenayers-bge opened this issue 1 year ago • 1 comments

Description

Databricks Asset Bundle configuration does not recognize cluster_mount_info as a valid field name, so you cannot provision job clusters with it.

jobs:
  my_job:
    job_clusters:
      - job_cluster_key: cluster
        new_cluster:
          cluster_mount_info:
            - remote_mount_dir_path: /path/on/nfs
              local_mount_dir_path: /local/path
              network_filesystem_info:
                mount_options: <nfs-ops>
                server_address: <nfs-host>

Use Case

As Databricks primarily interacts with object stores using FUSE this makes tackling the 'Small File Problem' very difficult, as you're limited by object store request HTTP request limitations, so if you're trying to process billions of tiny (<5kb) files, you can expect incredibly slow read times, so in some edge cases, loading that to an NFS in the same network as your cluster nodes can massively improve performance.

Root Cause

When the YAML input is normalized, it's normalized against the GO SDK job.JobSettings struct. (https://github.com/databricks/cli/blob/main/bundle/deploy/terraform/tfdyn/convert_job.go#L17)

The compute.ClusterSpec type in the Go SDK is normalized against the new_cluster YAML definition, but the Go SDK has been generated against an OpenAPI spec that doesn't contain cluster_mount_infos (https://github.com/databricks/databricks-sdk-go/blob/a823ca32fc4199d8cf2269b78cfe89331b4b688a/service/compute/model.go#L1090)

However, the cluster_mount_info block is supported by Terraform 1.37.0, so this should be valid to use (https://github.com/databricks/terraform-provider-databricks/blob/v1.37.0/clusters/clusters_api.go#L433).

Solution

Ensure the Go SDK is generated with cluster_mount_infos in the OpenAPI spec.

Ps. more of a syntax note, just as in terraform you define job_cluster as the plural job_clusters in DABs, the same should apply to cluster_mount_info (https://github.com/databricks/cli/blob/main/bundle/deploy/terraform/tfdyn/convert_job.go#L23)

stevenayers-bge avatar Mar 24 '24 11:03 stevenayers-bge

Thanks for reporting.

This field isn't part of our API specs (see https://docs.databricks.com/api/workspace/introduction) and therefore doesn't propagate into DABs. I'm checking with the relevant teams if this is intentional or not. For DABs, we intentionally rely on the API structure of these resources and not the TF one (where it deviates with singular/plural cases and a few others for aesthetic reasons). TF is an implementation detail for DABs.

pietern avatar Mar 25 '24 08:03 pietern

Closing in favor of this issue in Go SDK https://github.com/databricks/databricks-sdk-go/issues/866

andrewnester avatar Nov 13 '24 16:11 andrewnester