terraform-provider-databricks icon indicating copy to clipboard operation
terraform-provider-databricks copied to clipboard

[FEATURE] Pull/Update Repo by using commit hash

Open Nemeczek opened this issue 2 years ago • 17 comments

Hi there,

I love the new Repos in the Databricks Workspace. However our ambition is to have it automated. So if one merges and PR to the MAIN branch, we would love to have it automatically pulled. It is easy to do form the UI. But in the CI/CD we are forced to either weave a HTTP request to the API or install databricks-cli. Since we are investing in using Terraform provider to handle our Databricks infra, it would be lovely if we could just use Terraform solely for this.

Configuration

resource "databricks_repo" "dp_repo" {
  url    = var.main_dp_repo_url
  path   = "${var.main_dp_repo_path}/${var.repo_name}"
  branch = var.branch_name
  depends_on = [
    databricks_directory.dp_repo_dir
  ]
  commit_hash = var.commit_hash != "" ? var.commit_hash : null
}

Expected Behavior

I would love to see that chaning commit hash will update the repo in the Workspace.

Actual Behavior

Nothing. The internal state of Terraform will be changed but it does not have any real impact on the Workspace.

Steps to Reproduce

Terraform apply with supplying commit hash different one than the current at the Workspace.

Terraform and provider versions

Terraform v1.1.8 on windows_amd64

  • provider registry.terraform.io/databrickslabs/databricks v0.5.4
  • provider registry.terraform.io/hashicorp/azurerm v2.46.0

Nemeczek avatar Apr 20 '22 07:04 Nemeczek

There is no API right now to do that - https://docs.databricks.com/dev-tools/api/latest/repos.html#operation/update-repo supports only branch or tag. Please raise this issue to your solutions architect or customer success engineer.

alexott avatar Apr 20 '22 07:04 alexott

@Nemeczek What api call/cli command do you currently use to checkout a commit hash?

nfx avatar Apr 20 '22 07:04 nfx

Update a repo to the most recent commit of a remote branch or to a tag

Currently using this... But I am afraid that @alexott is right and there is no option to control directly the commit hash. On the other hand I did not managed to force Terraform to do as the command above - pull the latest commit.

Nemeczek avatar Apr 20 '22 07:04 Nemeczek

@nfx @alexott the question is if this feature request should be split in to. One about concrete commit hash and second is using terraform to just pull the latest. Because it would solve like 99% of our needs if this would just trigger the pull: terraform apply -replace "databricks_repo.dp_repo"

Nemeczek avatar Apr 20 '22 08:04 Nemeczek

@alexott isn't it a bug then?

https://github.com/databrickslabs/terraform-provider-databricks/blob/master/repos/resource_repo.go#L195-L209

nfx avatar Apr 20 '22 09:04 nfx

it will do the checkout to a new branch/tag if their values are changing

alexott avatar Apr 20 '22 09:04 alexott

it will do the checkout to a new branch/tag if their values are changing

So we can checkout different branch but we can't pull incoming changes from the current one?

Nemeczek avatar Apr 20 '22 09:04 Nemeczek

Because the state in the terraform doesn't change, no update is applied

alexott avatar Apr 20 '22 10:04 alexott

True, that's why I hoped to achieve the pull by trying to force Terraform to apply by using "- replace"...

Nemeczek avatar Apr 20 '22 12:04 Nemeczek

Question - where this code is used, do you want to keep it up to date to run jobs, or also for interactive use?

alexott avatar Apr 20 '22 12:04 alexott

The first. But I already can see that even the provider will support this, I will run into this: https://community.databricks.com/s/question/0D53f00001VJn01CAD/repos-configuration-for-azure-service-principal

For the interactive I really do not mind using CLI directly.

Nemeczek avatar Apr 20 '22 12:04 Nemeczek

@alexott theoretically, we can force resource update by using a conjunction of github_branch data resource and some new field, like triggers, like on null_resource - https://github.com/databrickslabs/terraform-provider-databricks/blob/master/scripts/nightly/azureit.tf#L83-L86

@Nemeczek you'll be able to provide GITHUB_TOKEN to TF pipeline, right?

nfx avatar Apr 20 '22 12:04 nfx

@nfx I think so. For now we are on Azure Devops Repos but we plan to move to GH at some point

Nemeczek avatar Apr 20 '22 12:04 Nemeczek

similar functionalities for gitlab - https://registry.terraform.io/providers/gitlabhq/gitlab/latest/docs/data-sources/branch and azure devops - hm... can't really find the equivalent of branch data source. though you can emulate it with data.external and some script... overall, updating things like branch or tag should ideally go into azdo/github actions and invoked from pipelines per repo, though we're still discussing the best way to do this.

@alexott could you experiment with adding triggers_update field and local repository checkout?

nfx avatar Apr 20 '22 12:04 nfx

For jobs we can just wait a bit :-)

alexott avatar Apr 20 '22 12:04 alexott

@Nemeczek look into git_source block in job resource - it supports run from the specific commit. For service principals and repos - check darabricks_git_credential resource.

alexott avatar May 08 '22 18:05 alexott

Following up - is this issue still relevant?

nfx avatar Aug 22 '22 09:08 nfx

Hey so the problem was to update/pull our Repos when the Pull Request was merged in Azure Devops. With the additions to Databricks APIs now we can manage Git Credentials so I am able to do the CICD by using PAT and Databricks CLI. So the issue is not relevant anymore

Nemeczek avatar Sep 26 '22 12:09 Nemeczek