prql
prql copied to clipboard
Don't format s-strings
From #965, broadening that issue to being able to avoid the auto-formatting. For context, this is can be important with non-standard SQL, like Snowflake's semi-structured data.
A couple of options for how we could do this:
- Implement an "ignore" option upstream, in https://github.com/shssoichiro/sqlformat-rs/issues/15. I'm not sure whether upstream would accept a PR (though it seems reasonable, and worst case we could maintain a fork)
- Replace s-strings after the auto-formatting, using
$s_string_n. This requires passing s-strings all the way through the compiler, reducing the modularity of the compiler phases, but otherwise being fairly simple. - Relying on the new
Optionsstruct to disable formatting entirely. This has the disadvantage of being an all-or-nothing option, but might be an acceptable temporary solution.
Yeah, we should never format s-strings. I think the second in is the best option and we can fallback to using the third for the time being.
I'm going to look at this. I will explore a fix for this based on the second option above to see how feasible this is.
Great @BlurrechDev !
If that becomes unwieldy (i.e. we're passing a huge struct along), then we can reassess.
Feel free to post half-complete code — either for feedback or to merge something initial.
Great. If you need some pointers, this is how I'd do it:
- the SQL is generated here, so before that, I'd replace all s-strings in RQ AST with generated s-strings.
- for that, I'd implement rq::RqFold, similar to CidRedirector.
- generated s-strings should be something that will never appear in actual queries. Max's suggestion of
$s_string_1, $s_string_2, $s_string_3, ...is ok. - the s-string-extractor must return the new AST and something like a
Vec<(String, String)>(vec of pairs of generated and actual s-strings). - after SQL is generated and formatted, replace the generated placeholders with actual s-strings using basic string replacement.
This issue prevents folks from using the escape hatch, so bumping this to "Priority"...
I've did a little work on this, but it's much harder than I anticipated.
My idea was to replace s-strings with some unique identifier before compiling to SQL. After SQL is formatted, we can replace the identifier back with s-strings.
Starting PRQL:
from my_table
select s"COUNT ( DISTINCT {my_col})"
AST:
From: my_table
Select:
SString:
- "COUNT ( DISTINCT "
- my_col
- ")"
SStrings extracted:
From: my_table
Select:
SString:
- '_anchor_1'
- my_col
- '_anchor_2'
Compiled to SQL:
SELECT '_anchor_1'my_col'_anchor_2' FROM my_table
Formatted:
SELECT
'_anchor_1' my_col '_anchor_2'
FROM
my_table
Inject SStrings back in:
SELECT
COUNT ( DISTINCT my_col )
FROM
my_table
... which is pretty close to what we'd want. The spacing after COUNT and before DISTINCT was preserved, as intended. But because formatting adds spacing between anchors, there are spaces around my_col.
I'm not sure we want to merge this, as it feels like a workaround using hacky text manipulation.
Is there anything to having the whole S-string as a variable?
So the expression to be formatted is:
SELECT
- '_anchor_1'my_col'_anchor_2'
+ $_s_string_
FROM my_table
...and then we replace the variable after the formatting? So the S-string is completely opaque to the formatter.
a workaround using hacky text manipulation.
This used to be all of the compiler! 😀
That's a good idea, but a bit problematic because you have to translate my_col somehow. If can could do this separately, then that's the way to go.
I was thinking of starting on this. But is it now intractable — everything is an s-string since the stdlib changes?
Or could we format the expressions that go into the s-strings separately? That seems quite difficult, if I'm thinking about it correctly.
No, not really. The s-strings in std.sql.prql have a completely separate codepath and never land in the AST as s-strings.
So it still as tractable as it was before.
Ah great! I hadn't realized that
I did remember you just asking about this: https://github.com/PRQL/prql/issues/2694#issuecomment-1575693384
:D
Sorry, that's bad memory even by my standards!