Bundle deploy / destroy errors on a few jobs only
Describe the issue
When using the deploy or destroy command, certain jobs are not getting created or deleted. The jobs are never the same one, and it's only ever 5-10 on a total of 100.
Running the command a second time will create or destroy the remaining jobs.
Configuration
Using the commands from local to a testing environment where no other jobs or bundles can conflict. None of the jobs are running or modified.
Steps to reproduce the behavior
Please list the steps required to reproduce the issue, for example:
- Run
databricks bundle deploy ...or - Run
databricks bundle destroy... - See error
Expected Behavior
All jobs go through.
Actual Behavior
Some jobs don't go through
OS and CLI version
Databricks CLI v0.239.1
Is this a regression?
Same error on Databricks CLI v0.236.0
Debug Logs
Error: terraform apply: exit status 1
Error: cannot delete job:
Error: cannot delete job:
Error: cannot delete job:
Error: cannot delete job:
Bundle destroy successfully.
Thanks for reporting the issue.
Does the output not include any error messages?
I suspect this could be related to some kind of rate limiting, but haven't seen this before.
No other errors no. I get the regular logging, where all affected jobs are listed, then the operation starts, then the only error is what I posted. The jobs are not named, but they are still present in the workspace. Running the command a second time deletes/creates them.
The creation error is a bit more verbose, but not any more details :
Error: cannot create job:
with databricks_job.star-slv-lnd-slv-dom,
on bundle.tf.json line 33893, in resource.databricks_job.star-slv-lnd-slv-dom:
33893: },
Error: cannot create job:
with databricks_job.tpa-raw-ini-brz-init,
on bundle.tf.json line 35802, in resource.databricks_job.tpa-raw-ini-brz-init:
35802: },
Error: cannot create job:
with databricks_job.virage-brz-inc-brz-mrg,
on bundle.tf.json line 38097, in resource.databricks_job.virage-brz-inc-brz-mrg:
38097: },
Error: cannot create job:
with databricks_job.virage-raw-inc-brz-lnd,
on bundle.tf.json line 43642, in resource.databricks_job.virage-raw-inc-brz-lnd:
43642: },
Error: cannot create job:
with databricks_job.virage-raw-ini-brz-lnd,
on bundle.tf.json line 45509, in resource.databricks_job.virage-raw-ini-brz-lnd:
45509: },
Still running into this issues at the moment and it is only solved by deploying twice.
Is there a rate limit on the api that is not being respected?
Thanks for bumping this. Could you send an email to [email protected] with the workspace IDs where this is happening? Then I can escalate internally. This seems to be an issue with the Jobs API rather than DABs itself.
Bump. Having the same issue. Out of 100-ish jobs/workflows different ones fail on DAB deployment.
Ran into the issue again this morning. Same behavior as before, running the deploy again fixes the issue. No other info than : Error: cannot create job: An unexpected error occurred
@pietern
Hi, I'm also consistently facing this issue.
On deploy this happens for seemingly random set of jobs (2-4 out of ~50):
Error: cannot create job: An unexpected error occurred
with databricks_job.XXX,
on bundle.tf.json line 736, in resource.databricks_job.XXX:
736: },
On destroy, Error: cannot delete job: An unexpected error occurred is printed. It takes second destroy to wipe out these.
Appreciate the additional reports of this issue. The team is working on a backend fix to address the underlying issue.
In the meantime, not using tags on your jobs should reduce the probability of this happening.
A fix for this issue has been rolled out.
Could you retry deploying/destroying your bundles and see if the issue no longer occurs? Thank you!
Deployment now works without issues for me. Thanks @pietern!
Thanks @pietern , the bug was intermittent so I will trust the team and reopen a ticket if it comes up again :)