terraform-provider-databricks
terraform-provider-databricks copied to clipboard
[FEATURE] Pull/Update Repo by using commit hash
Hi there,
I love the new Repos in the Databricks Workspace. However our ambition is to have it automated. So if one merges and PR to the MAIN branch, we would love to have it automatically pulled. It is easy to do form the UI. But in the CI/CD we are forced to either weave a HTTP request to the API or install databricks-cli. Since we are investing in using Terraform provider to handle our Databricks infra, it would be lovely if we could just use Terraform solely for this.
Configuration
resource "databricks_repo" "dp_repo" {
url = var.main_dp_repo_url
path = "${var.main_dp_repo_path}/${var.repo_name}"
branch = var.branch_name
depends_on = [
databricks_directory.dp_repo_dir
]
commit_hash = var.commit_hash != "" ? var.commit_hash : null
}
Expected Behavior
I would love to see that chaning commit hash will update the repo in the Workspace.
Actual Behavior
Nothing. The internal state of Terraform will be changed but it does not have any real impact on the Workspace.
Steps to Reproduce
Terraform apply with supplying commit hash different one than the current at the Workspace.
Terraform and provider versions
Terraform v1.1.8 on windows_amd64
- provider registry.terraform.io/databrickslabs/databricks v0.5.4
- provider registry.terraform.io/hashicorp/azurerm v2.46.0
There is no API right now to do that - https://docs.databricks.com/dev-tools/api/latest/repos.html#operation/update-repo supports only branch or tag. Please raise this issue to your solutions architect or customer success engineer.
@Nemeczek What api call/cli command do you currently use to checkout a commit hash?
Update a repo to the most recent commit of a remote branch or to a tag
Currently using this... But I am afraid that @alexott is right and there is no option to control directly the commit hash. On the other hand I did not managed to force Terraform to do as the command above - pull the latest commit.
@nfx @alexott the question is if this feature request should be split in to. One about concrete commit hash and second is using terraform to just pull the latest. Because it would solve like 99% of our needs if this would just trigger the pull:
terraform apply -replace "databricks_repo.dp_repo"
@alexott isn't it a bug then?
https://github.com/databrickslabs/terraform-provider-databricks/blob/master/repos/resource_repo.go#L195-L209
it will do the checkout to a new branch/tag if their values are changing
it will do the checkout to a new branch/tag if their values are changing
So we can checkout different branch but we can't pull incoming changes from the current one?
Because the state in the terraform doesn't change, no update is applied
True, that's why I hoped to achieve the pull by trying to force Terraform to apply by using "- replace"...
Question - where this code is used, do you want to keep it up to date to run jobs, or also for interactive use?
The first. But I already can see that even the provider will support this, I will run into this: https://community.databricks.com/s/question/0D53f00001VJn01CAD/repos-configuration-for-azure-service-principal
For the interactive I really do not mind using CLI directly.
@alexott theoretically, we can force resource update by using a conjunction of github_branch data resource and some new field, like triggers, like on null_resource - https://github.com/databrickslabs/terraform-provider-databricks/blob/master/scripts/nightly/azureit.tf#L83-L86
@Nemeczek you'll be able to provide GITHUB_TOKEN to TF pipeline, right?
@nfx I think so. For now we are on Azure Devops Repos but we plan to move to GH at some point
similar functionalities for gitlab - https://registry.terraform.io/providers/gitlabhq/gitlab/latest/docs/data-sources/branch and azure devops - hm... can't really find the equivalent of branch data source. though you can emulate it with data.external and some script... overall, updating things like branch or tag should ideally go into azdo/github actions and invoked from pipelines per repo, though we're still discussing the best way to do this.
@alexott could you experiment with adding triggers_update field and local repository checkout?
For jobs we can just wait a bit :-)
@Nemeczek look into git_source block in job resource - it supports run from the specific commit. For service principals and repos - check darabricks_git_credential resource.
Following up - is this issue still relevant?
Hey so the problem was to update/pull our Repos when the Pull Request was merged in Azure Devops. With the additions to Databricks APIs now we can manage Git Credentials so I am able to do the CICD by using PAT and Databricks CLI. So the issue is not relevant anymore