tree-sitter-sql icon indicating copy to clipboard operation
tree-sitter-sql copied to clipboard

Errors when embedding Presto inside of Python strings

Open tomspeak opened this issue 1 year ago • 4 comments

Almost all of the SQL I write is inside of python data pipelines, using Spark and Presto flavoured syntax.

So my queries are written like

SqlQuery(
  select="""
    WITH users_in_threads AS (
          SELECT
              id,
          FROM {THREAD_TABLE}
          CROSS JOIN UNNEST(userid_array) AS t (userid)
      )
    SELECT
        userid,
        MAP_AGG(id, name) AS names,
        COUNT() AS cnt
    FROM {USERS_TABLE}
    GROUP BY
        1
  """
)

When using tree-sitter-sql via neovim, I extended it to also capture the content inside of these strings via an injection.

; extends

(string 
  (string_content) @injection.content
    (#vim-match? @injection.content "^\w*SELECT|FROM|INNER|JOIN|UNION|WHERE|CREATE|DROP|INSERT|UPDATE|ALTER|WITH.*$")
    (#set! injection.language "sql"))

This "works", using :InspectTree I can see it correctly captures SELECT/WITH statement, but once it gets to the python string interpolation FROM {THREAD_TABLE}, the parser errors.

This is quite a crazy use-case, but wondered if there is anything that can be done to make the parser looser in this context so it at least does not error out?

tomspeak avatar Oct 11 '24 11:10 tomspeak

I do a similar thing but with Ruby e.g.

sql = <<~SQL
  SELECT *
  FROM table
  WHERE id = %{id}
SQL

unfortunately there isn't a good way to make the parser recover gracefully since it's not actually valid SQL syntax. Tree-sitter should recover most of the time and not make things look completely broken but I have the occasional query where that's not the case.

DerekStride avatar Oct 11 '24 13:10 DerekStride

Makes sense!

Due to the compilation steps of tree-sitter-sql, I take it there's no easy way for me to locally extend the definitions? As this is the only way I ever interact with SQL, I am OK (in my local version) introducing the concept of {PYTHON_VARIABLE} to the SQL syntax.

tomspeak avatar Oct 12 '24 19:10 tomspeak

I think you'd have to fork the repo to extend the parser in that way. It's like not too hard, you could just add the curly brace ({}) characters to the identifier node and that might be enough.

https://github.com/DerekStride/tree-sitter-sql/blob/c649cb98ca91c660ba8a015aa1fcfc53b859b951/grammar.js#L3484

DerekStride avatar Oct 15 '24 13:10 DerekStride

I was wondering if it would not be possible to have recursive injections. I mean nvim-treesitter knows that the embedded code is SQL, so why not apply the query for injections there as well.

matthias-Q avatar Oct 15 '24 18:10 matthias-Q