Errors when embedding Presto inside of Python strings
Almost all of the SQL I write is inside of python data pipelines, using Spark and Presto flavoured syntax.
So my queries are written like
SqlQuery(
select="""
WITH users_in_threads AS (
SELECT
id,
FROM {THREAD_TABLE}
CROSS JOIN UNNEST(userid_array) AS t (userid)
)
SELECT
userid,
MAP_AGG(id, name) AS names,
COUNT() AS cnt
FROM {USERS_TABLE}
GROUP BY
1
"""
)
When using tree-sitter-sql via neovim, I extended it to also capture the content inside of these strings via an injection.
; extends
(string
(string_content) @injection.content
(#vim-match? @injection.content "^\w*SELECT|FROM|INNER|JOIN|UNION|WHERE|CREATE|DROP|INSERT|UPDATE|ALTER|WITH.*$")
(#set! injection.language "sql"))
This "works", using :InspectTree I can see it correctly captures SELECT/WITH statement, but once it gets to the python string interpolation FROM {THREAD_TABLE}, the parser errors.
This is quite a crazy use-case, but wondered if there is anything that can be done to make the parser looser in this context so it at least does not error out?
I do a similar thing but with Ruby e.g.
sql = <<~SQL
SELECT *
FROM table
WHERE id = %{id}
SQL
unfortunately there isn't a good way to make the parser recover gracefully since it's not actually valid SQL syntax. Tree-sitter should recover most of the time and not make things look completely broken but I have the occasional query where that's not the case.
Makes sense!
Due to the compilation steps of tree-sitter-sql, I take it there's no easy way for me to locally extend the definitions? As this is the only way I ever interact with SQL, I am OK (in my local version) introducing the concept of {PYTHON_VARIABLE} to the SQL syntax.
I think you'd have to fork the repo to extend the parser in that way. It's like not too hard, you could just add the curly brace ({}) characters to the identifier node and that might be enough.
https://github.com/DerekStride/tree-sitter-sql/blob/c649cb98ca91c660ba8a015aa1fcfc53b859b951/grammar.js#L3484
I was wondering if it would not be possible to have recursive injections. I mean nvim-treesitter knows that the embedded code is SQL, so why not apply the query for injections there as well.