dbx icon indicating copy to clipboard operation
dbx copied to clipboard

dbx sync repo should not have default profile

Open ep-mo opened this issue 2 years ago • 7 comments
trafficstars

Expected Behavior

dbx sync repo --dest-repo=test should default to use $DATABRICKS_HOST and $DATABRICKS_TOKEN environment variables for authentication.

Current Behavior

dbx sync repo --dest-repo=test defaults to --profile=DEFAULT, because argument --profile is specified with [default: DEFAULT]. This means that the current default behavior is to use config file instead of environment variables (as recommended for CI/CD pipelines). Since our CI/CD pipeline does not have a config file, we get the following error: Could not find a databricks-cli config for profile DEFAULT

Our current workaround is to set an empty profile: dbx sync repo --dest-repo=test --profile=

Steps to Reproduce (for bugs)

  • $DATABRICKS_HOST and $DATABRICKS_TOKEN must be specified
  • ~/.databrickscfg should not exist
  • run dbx sync repo --dest-repo=<repo>

Context

CI/CD pipeline using dbx sync repo

Your Environment

  • dbx version used: 0.8.18
  • Databricks Runtime version:

ep-mo avatar Aug 14 '23 11:08 ep-mo

I have the same issue.

However, as a workaround for this, the databrikcs-cli can be set on the DEFAULT profile by running an additional step as follows:

  • bash: | DATABRICKS_HOST='https://adb-XXXXXXXXXXX.azuredatabricks.net/' # This can be easily parametrized to avoid passing the value in clear text. DATABRICKS_TOKEN='XXXXXXXXXXXXXXXXXXXXXXXXXXX' # This can be easily parametrized to avoid passing the value in clear text. databricks configure --token --profile DEFAULT <<EOF $DATABRICKS_HOST $DATABRICKS_TOKEN EOF displayName: 'Databricks Configuration for repo Sync'

CristianSmau avatar Sep 14 '23 09:09 CristianSmau

Here's a workaround that can be quite useful, but it might not be the best fit for containerized deployments, especially when you're running dbx deploy inside a Docker container. In this scenario:

A: You'll need to create the cfg file in your Dockerfile during the build process. B: Consequently, you'll have to rebuild the image whenever there are changes to these environment variables.

One thing to keep in mind is that this approach involves storing your secrets in plaintext in a file, which isn't ideal 👎. It's worth noting that cfg profiles are primarily designed for local development, which can make them less suitable for continuous integration and continuous deployment (CICD).

Using the profile can also complicate the typical process of customizing a container with environment variables. Since the cfg profile is created during the build, it's challenging to use the same image for builds across multiple environments. This can introduce some complexities, especially when dealing with CICD tools like GitLab and GitHub Actions that rely on secret environment variables.

Don't get me wrong—I'm currently using this workaround myself, but it's worth acknowledging that the issue still deserves attention.

doug-cresswell avatar Sep 14 '23 12:09 doug-cresswell

The workaround we use in our CI/CD pipeline does not require a cfg file, and is quite simple. We just pass an empty --profile= argument (as described in original issue), and dbx will fallback to using $DATABRICKS_HOST and $DATABRICKS_TOKEN. It took a process of trial and error to figure out..

Our current workaround is to set an empty profile: dbx sync repo --dest-repo=test --profile=

ep-mo avatar Sep 14 '23 12:09 ep-mo

Thanks @ep-mo, I didn't pick up on the empty profile workaround on my first read through. I will try to reproduce myself. Where you say

~/.databrickscfg should not exist

does this mean the empty profile workaround only works if the file is not present?

doug-cresswell avatar Sep 14 '23 14:09 doug-cresswell

does this mean the empty profile workaround only works if the file is not present?

No, I don't think it matters, the workaround should work even if the cfg file is present (from the top of my head). When you specify --profile=, dbx should use your environment variables. Should not matter if the file is there.

In the Steps to Reproduce section I just tried to explain the steps to reproduce the issue and how to get a stack trace. If you have a cfg file when you try to reproduce, it will just fallback to your DEFAULT profile in your cfg file, and everything seems to work fine (if you have said file). However, expected behavior is to use environment variables if you don`t use the profile argument, not the cfg file.

ep-mo avatar Sep 14 '23 14:09 ep-mo

Looks like DBX got deprecated and replaced by DAB. Worth looking into the new package.

CristianSmau avatar Sep 14 '23 15:09 CristianSmau

Looks like DBX got deprecated and replaced by ADB . Worth looking into the new package.

As far as I know, dbx is not yet deprecated and is still maintained by Databricks Labs (source). Our CI/CD is built around dbx, so we will continue to use dbx, probably at least until DAB is generally available. But yeah, worth looking into the new package. If we started from scratch today, we would probably look into DAB, because it's expected to supersede dbx.

ep-mo avatar Sep 14 '23 16:09 ep-mo