dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

Unable to execute multiple SQL statements on one entry in hooks config

Open geo-martino opened this issue 1 year ago • 3 comments
trafficstars

Describe the bug

I am attempting to execute multiple SQL statements on various hooks and seem to be unable to execute multiple statements separated by ; within one entry.

I am generating multiple GRANT statements with a macro. I've rewritten an abridged version of the macro and it's usage below showing how it is used in both the on-run-start and on-run-end hooks in the dbt_project file on one entry.

I find that, if a macro generates multiple separate SQL statements, then the command fails. However if a macro generates just one SQL statement, the command executes with no error. I see this exact behaviour when configuring on-run-start, on-run-end, pre-hook, and post-hook hooks.

As a workaround, I have set the macros to execute each statement when called by using the run_query macro within the grant_access macro.

Steps To Reproduce

A failing macro which generates multiple SQL statements:

{% macro grant_access(table_name) -%}
  {%- set group_name = "group" -%}
  {%- set catalog_identifier = "`catalog_name`" -%}
  {%- set schema_identifier = catalog_identifier ~ "`schema_name`" -%}
  {%- set schema_identifier = schema_identifier ~ table_name -%}

  {%- if execute -%}
      GRANT USE CATALOG ON CATALOG {{ catalog_identifier }} TO `{{ group_name }}`;
      GRANT USE SCHEMA ON SCHEMA {{ schema_identifier }} TO `{{ group_name }}`;
      GRANT SELECT ON TABLE {{ table_identifier }} TO `{{ group_name }}`;
      GRANT MODIFY ON TABLE {{ table_identifier }} TO `{{ group_name }}`;
  {%- endif -%}
{%- endmacro %}

A successful macro which generates only one SQL statement:

{% macro grant_access_catalog(table_name) -%}
  {%- set group_name = "group" -%}
  {%- set catalog_identifier = "`catalog_name`" -%}
  {%- set schema_identifier = catalog_identifier ~ "`schema_name`" -%}
  {%- set schema_identifier = schema_identifier ~ table_name -%}

  {%- if execute -%}
      GRANT USE CATALOG ON CATALOG {{ catalog_identifier }} TO `{{ group_name }}`;
  {%- endif -%}
{%- endmacro %}

dbt_project.yml:

on-run-start:
  - "{{ grant_access('start_table') }}"
on-run-end:
  - "{{ grant_access_catalog('end_table') }}"

Whenever I try to execute dbt run, these commands fail.

Expected behavior

The statements should execute successfully. The documentation for dbt shows that it is possible to build and execute multiple statements with jinja within one entry on these config keys.

Screenshots and log output

[0m18:32:10.323706 [debug] [MainThread]: On master: /* {"app": "dbt", "dbt_version": "1.8.7", "dbt_databricks_version": "1.8.7", "databricks_sql_connector_version": "3.1.2", "profile_name": "cibc", "target_name": "local", "connection_name": "master"} */

        GRANT USE CATALOG ON CATALOG `catalog_name` TO `deployments`;
        GRANT USE SCHEMA ON SCHEMA `catalog_name`.`schema_name` TO `deployments`;
        GRANT SELECT ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`;
        GRANT MODIFY ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`;
[0m18:32:10.324107 [debug] [MainThread]: Databricks adapter: Cursor(session-id=4a9c2a98-f79f-46cb-9b3c-cad9bea26f00, command-id=Unknown) - Created cursor
[0m18:32:10.548658 [debug] [MainThread]: Databricks adapter: Cursor(session-id=4a9c2a98-f79f-46cb-9b3c-cad9bea26f00, command-id=Unknown) - Closing cursor
[0m18:32:10.550652 [debug] [MainThread]: Databricks adapter: Exception while trying to execute query
/* {"app": "dbt", "dbt_version": "1.8.7", "dbt_databricks_version": "1.8.7", "databricks_sql_connector_version": "3.1.2", "profile_name": "cibc", "target_name": "local", "connection_name": "master"} */

        GRANT USE CATALOG ON CATALOG `catalog_name` TO `deployments`;
        GRANT USE SCHEMA ON SCHEMA `catalog_name`.`schema_name` TO `deployments`;
        GRANT SELECT ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`;
        GRANT MODIFY ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`;
: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'USE'. SQLSTATE: 42601 (line 3, pos 14)

== SQL ==
/* {"app": "dbt", "dbt_version": "1.8.7", "dbt_databricks_version": "1.8.7", "databricks_sql_connector_version": "3.1.2", "profile_name": "cibc", "target_name": "local", "connection_name": "master"} */

        GRANT USE CATALOG ON CATALOG `catalog_name` TO `deployments`;
--------------^^^
        GRANT USE SCHEMA ON SCHEMA `catalog_name`.`schema_name` TO `deployments`;
        GRANT SELECT ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`;
        GRANT MODIFY ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`

Error properties: diagnostic-info=org.apache.hive.service.cli.HiveSQLException: Error running query: [PARSE_SYNTAX_ERROR] org.apache.spark.sql.catalyst.parser.ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'USE'. SQLSTATE: 42601 (line 3, pos 14)

== SQL ==
/* {"app": "dbt", "dbt_version": "1.8.7", "dbt_databricks_version": "1.8.7", "databricks_sql_connector_version": "3.1.2", "profile_name": "cibc", "target_name": "local", "connection_name": "master"} */

        GRANT USE CATALOG ON CATALOG `catalog_name` TO `deployments`;
--------------^^^
        GRANT USE SCHEMA ON SCHEMA `catalog_name`.`schema_name` TO `deployments`;
        GRANT SELECT ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`;
        GRANT MODIFY ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`

	at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:49)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:786)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:624)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$5(SparkExecuteStatementOperation.scala:469)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.sql.execution.SQLExecution$.withRootExecution(SQLExecution.scala:704)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:469)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:48)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:276)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:272)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:46)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:43)
	at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:27)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:95)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:76)
	at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:27)
	at com.databricks.spark.util.PublicDBLogging.withAttributionTags0(DatabricksSparkUsageLogger.scala:74)
	at com.databricks.spark.util.DatabricksSparkUsageLogger.withAttributionTags(DatabricksSparkUsageLogger.scala:174)
	at com.databricks.spark.util.UsageLogging.$anonfun$withAttributionTags$1(UsageLogger.scala:617)
	at com.databricks.spark.util.UsageLogging$.withAttributionTags(UsageLogger.scala:729)
	at com.databricks.spark.util.UsageLogging$.withAttributionTags(UsageLogger.scala:738)
	at com.databricks.spark.util.UsageLogging.withAttributionTags(UsageLogger.scala:617)
	at com.databricks.spark.util.UsageLogging.withAttributionTags$(UsageLogger.scala:615)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withAttributionTags(SparkExecuteStatementOperation.scala:71)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.$anonfun$withLocalProperties$11(ThriftLocalProperties.scala:195)
	at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:190)
	at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:71)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:71)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:446)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:432)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:482)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.sql.catalyst.parser.ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'USE'. SQLSTATE: 42601 (line 3, pos 14)

== SQL ==
/* {"app": "dbt", "dbt_version": "1.8.7", "dbt_databricks_version": "1.8.7", "databricks_sql_connector_version": "3.1.2", "profile_name": "cibc", "target_name": "local", "connection_name": "master"} */

        GRANT USE CATALOG ON CATALOG `catalog_name` TO `deployments`;
--------------^^^
        GRANT USE SCHEMA ON SCHEMA `catalog_name`.`schema_name` TO `deployments`;
        GRANT SELECT ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`;
        GRANT MODIFY ON TABLE `catalog_name`.`schema_name`.`start_table` TO `deployments`

	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:308)
	at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:114)
	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:137)
	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(AbstractSqlParser.scala:106)
	at com.databricks.sql.parser.DatabricksSqlParser.$anonfun$parsePlan$1(DatabricksSqlParser.scala:80)
	at com.databricks.sql.parser.DatabricksSqlParser.parse(DatabricksSqlParser.scala:101)
	at com.databricks.sql.parser.DatabricksSqlParser.parsePlan(DatabricksSqlParser.scala:77)
	at com.databricks.sql.QueryRuntimePredictionUtils$.$anonfun$getParsedPlanWithTracking$2(QueryRuntimePrediction.scala:383)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:452)
	at com.databricks.sql.QueryRuntimePredictionUtils$.$anonfun$getParsedPlanWithTracking$1(QueryRuntimePrediction.scala:382)
	at org.apache.spark.sql.execution.SQLExecution$.withExecutionPhase(SQLExecution.scala:143)
	at com.databricks.sql.QueryRuntimePredictionUtils$.getParsedPlanWithTracking(QueryRuntimePrediction.scala:382)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$analyzeQuery$2(SparkExecuteStatementOperation.scala:608)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1180)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$analyzeQuery$1(SparkExecuteStatementOperation.scala:590)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:580)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.analyzeQuery(SparkExecuteStatementOperation.scala:590)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$5(SparkExecuteStatementOperation.scala:663)
	at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:537)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:663)
	... 43 more

System information

The output of dbt --version:

Core:
  - installed: 1.8.7
  - latest:    1.8.7 - Up to date!

Plugins:
  - databricks: 1.8.7 - Up to date!
  - spark:      1.8.0 - Up to date!

The operating system you're using: MacOS Sequoia 15.0.1 (24A348)

The output of python --version: Python 3.12.7

Additional context

I am not sure if this is an issue with the dbt-databricks or dbt-core. I am posting here first as it appears that, according to the dbt-core documentation noted above, this should be supported. My guess therefore is that the issue stems from the adapter, but I will be happy to forward this issue to dbt-core if I am wrong.

geo-martino avatar Oct 20 '24 22:10 geo-martino