dbt-databricks
dbt-databricks copied to clipboard
noisy --fail-fast logs
trafficstars
User has raised that utilizing the --fail-fast flag for job runs in dbt Cloud scheduled runs is causing incredibly noisy logging, making surfacing the error and actual issue difficult.
- 23 thread concurrency
- There are models that are running at the same time
- But fail fast says to terminate the run as soon as we run into a single error The logging is interesting - as we can see that the databricks adapter is going through cancelling the connections, meanwhile with queries that have started are still trying to connect to the server but the connection has been canceled, this error occurs:
: Error during request to server: RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist.
Error properties: attempt=1/30, bounded-retry-delay=None, elapsed-seconds=0.21970534324645996/900.0, error-message=RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist., http-code=404, method=GetOperationStatus, no-retry-reason=non-retryable error, original-exception=RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist., query-id=b'\x01\xefn\x95\xdbi\x14\x0e\xa8\xf1\xd4Ca\x07B\x8d', session-id=None
in addition, apache spark specific logging:
$anonfun$analyzeQuery$1(SparkExecuteStatementOperation.scala:541)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:527)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.analyzeQuery(SparkExecuteStatementOperation.scala:541)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$5(SparkExecuteStatementOperation.scala:633)
at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:532)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:633)
... 43 more
, operation-id=01ef6e95-cea5-18b1-8077-63b37a785969
databricks version: 1.8.5post2+6b29d329ae8a3ce6bc066d032ec3db590160046c dbt version: versionless - 2024.9.239
Expected behavior
from the user - I had assumed that was because we were using multiple threads, but I would expect it to fail nice and gracefully rather than provide a log consisting of 500 identical messages, and sometimes not even providing the original cause of the first model to fail.