cli icon indicating copy to clipboard operation
cli copied to clipboard

Bundle generate and git paths

Open blood-onix opened this issue 1 year ago • 2 comments

Describe the issue

Probably its already well know, but I havent found anything.

databricks generate doesnt support git paths as it checks for an object state on the workspace, is there any plans to add support or any workarounds available? Most likely this object check should be skipped if source is set to git...

Configuration

Steps to reproduce the behavior

  1. Create job with notebooks located on the git repo.
  2. Run databricks bundle generate job --existing-job-id xxxxxx
  3. See error

Expected Behavior

job yml should be created

Actual Behavior

image image

OS and CLI version

MacOS 14.4.1 Databricks CLI v0.219.0

Is this a regression?

Didnt work at least in v0.215.0

Debug Logs

11:09:41 INFO start pid=56457 version=0.219.0 args="databricks, bundle, generate, job, --existing-job-id, 380115427507209, --log-level=debug" 11:09:41 DEBUG Found bundle root at /Users/xxxxx/Projects/test_generate (file /Users/xxxxx/Projects/test_generate/databricks.yml) pid=56457 11:09:41 DEBUG Apply pid=56457 mutator=load 11:09:41 INFO Phase: load pid=56457 mutator=load 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=EntryPoint 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=scripts.preinit 11:09:41 DEBUG No script defined for preinit, skipping pid=56457 mutator=load mutator=seq mutator=scripts.preinit 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=ProcessRootIncludes 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=ProcessRootIncludes mutator=seq 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=VerifyCliVersion 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=EnvironmentsToTargets 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=InitializeVariables 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=DefineDefaultTarget(default) 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=LoadGitDetails 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=SelectDefaultTarget 11:09:41 DEBUG Apply pid=56457 mutator=load mutator=seq mutator=SelectDefaultTarget mutator=SelectTarget(dev) 11:09:41 DEBUG Loading profile MY because of host match pid=56457 11:09:41 DEBUG GET /api/2.1/jobs/get?job_id=380115427507209 < HTTP/2.0 200 OK < { < "created_time": 1713442621561, < "creator_user_name": "xxxxx", < "job_id": 380115427507209, < "run_as_owner": false, < "run_as_user_name": "xxxxx", < "settings": { < SETTINGS_JSON_HERE < } < } pid=56457 sdk=true 11:09:41 DEBUG non-retriable error: Path (src/data_pipelines/ingest) doesn't start with '/' pid=56457 sdk=true 11:09:41 DEBUG GET /api/2.0/workspace/get-status?path=src/data_pipelines/ingest < HTTP/2.0 400 Bad Request < { < "error_code": "INVALID_PARAMETER_VALUE", < "message": "Path (src/data_pipelines/ingest) doesn't start with '/'" < } pid=56457 sdk=true Error: Path (src/data_pipelines/ingest) doesn't start with '/' 11:09:41 ERROR failed execution pid=56457 exit_code=1 error="Path (src/data_pipelines/ingest) doesn't start with '/'"

blood-onix avatar May 07 '24 09:05 blood-onix

Thanks for reporting the issue.

This case isn't covered, indeed. Is your intent to copy the referred files to your DAB as well, or to keep the Git reference? Use of Git references to code with DABs is an anti-pattern because the code and job definition then no longer originate from the same place (unless you always commit + push before doing deploys).

pietern avatar May 14 '24 10:05 pietern

Thanks for reporting the issue.

This case isn't covered, indeed. Is your intent to copy the referred files to your DAB as well, or to keep the Git reference? Use of Git references to code with DABs is an anti-pattern because the code and job definition then no longer originate from the same place (unless you always commit + push before doing deploys).

hey, thanks for a reply! We are planning to stay with gitops approach and start using bundles as a tool to migrate jobs between stages. The bundle itself as well as the code will be a part of automated release proccess so we would be sure that bundle and code went through PR and will point to the same git commit/tag.

So far I can simply export json via api and transform it to yaml but would be great to use native generate function.

blood-onix avatar May 14 '24 16:05 blood-onix

I would also like to see this feature in the bundle generate command. I don't see it as an anti-pattern because it doesn't assume how the user is using it. It would also help with gradual migrations to DAB.

Also, I think just maintaining the git source is more than enough for a phased approach. Automatically fetching the notebook from the git ref and converting the source would be a bit much in my opinion, so maybe a later phase (if ever).

I also wouldn't mind making my first contribution on this issue if needed!

zcking avatar Dec 27 '24 19:12 zcking

@blood-onix I have a working branch here if you just want to maintain the Git references, not download the notebooks: https://github.com/zcking/cli/tree/feature/support-git-source-on-generate

I would welcome any feedback from the maintainers, and will submit a PR if the change is welcomed!

image

zcking avatar Dec 28 '24 03:12 zcking