dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

PARSE_SYNTAX_ERROR when running multiple queries with run_query()

Open wcmbishop opened this issue 10 months ago • 4 comments

Describe the bug

When trying to run multiple semi-colon separated queries to Databricks with the run_query() macro, I get a syntax error PARSE_SYNTAX_ERROR.

Steps To Reproduce

Below is a sample macro that reproduces this error for me.

{% macro my_macro() -%}

  {% set sql_commands %}
    select 1;
    select 2;
  {% endset %}

  {{ run_query(sql_commands) }}

{%- endmacro %}

Expected behavior

My expectation is to be able to run a set of valid SQL statements separated by semi-colons. There are many examples of dbt macro code online (e.g. the grant_select macro in the docs here) that show multiple SQL queries in a macro being run together. I'm able to do this with the example macro above (and others) in Snowflake for example.

My specific use-case is to build a macro that generates schema grants from our dbt code (similar to the example grant_select linked to above). Currently I'm doing this via a work-around by making a custom macro run_queries() that takes a list of SQL statements and executes them each separately.

Screenshots and log output

Here's the error output in the command line from the example macro run above.

22:06:14  Encountered an error while running operation: Runtime Error
  Runtime Error
    
    [PARSE_SYNTAX_ERROR] Syntax error at or near 'select': extra input 'select'.(line 2, pos 4)
    
    == SQL ==
    select 1;
        select 2
    ----^^^
    /* {"app": "dbt", "dbt_version": "1.7.11", "dbt_databricks_version": "1.7.13", "databricks_sql_connector_version": "2.9.5", "profile_name": "fleet_databricks", "target_name": "dev", "connection_name": "macro_my_macro"} */

System information

The output of dbt --version:

Core:
  - installed: 1.7.11
  - latest:    1.7.11 - Up to date!

Plugins:
  - databricks: 1.7.13 - Up to date!
  - spark:      1.7.1  - Up to date!

The operating system you're using: macOS Sonoma Version 14.4.1

The output of python --version: Python 3.9.4

Additional context

I'm wondering if this is potentially related to the SparkSQL implementation in Databricks. I learned from our Databricks reps that each individual SQL command run in the Databricks SQL UI is generated against a separate SparkSQL environment.

wcmbishop avatar Apr 15 '24 22:04 wcmbishop

As you suspect, this is currently a limitation of the platform. I've raised the issue internally, but can't commit to a timeline yet. Thanks for your report.

benc-db avatar Apr 16 '24 17:04 benc-db

I guess in the mean time, one should do one of these instead:

{% macro my_macro() -%}
  {% set sql_commands = ['select 1', 'select 2'] %}
  {% for command in sql_commands %}
    {% do run_query(command) %}
  {% endfor %}
{%- endmacro %}

jeremyyeo avatar Apr 22 '24 11:04 jeremyyeo

@jeremyyeo yes, pretty much. I haven't written a macro for it, but that's basically what I do internally in places where dbt expects us to be able to execute multiple statements.

benc-db avatar Apr 29 '24 15:04 benc-db