prql icon indicating copy to clipboard operation
prql copied to clipboard

Don't format s-strings

Open max-sixty opened this issue 2 years ago • 15 comments

From #965, broadening that issue to being able to avoid the auto-formatting. For context, this is can be important with non-standard SQL, like Snowflake's semi-structured data.

A couple of options for how we could do this:

  • Implement an "ignore" option upstream, in https://github.com/shssoichiro/sqlformat-rs/issues/15. I'm not sure whether upstream would accept a PR (though it seems reasonable, and worst case we could maintain a fork)
  • Replace s-strings after the auto-formatting, using $s_string_n. This requires passing s-strings all the way through the compiler, reducing the modularity of the compiler phases, but otherwise being fairly simple.
  • Relying on the new Options struct to disable formatting entirely. This has the disadvantage of being an all-or-nothing option, but might be an acceptable temporary solution.

max-sixty avatar Dec 17 '22 19:12 max-sixty

Yeah, we should never format s-strings. I think the second in is the best option and we can fallback to using the third for the time being.

aljazerzen avatar Dec 24 '22 14:12 aljazerzen

I'm going to look at this. I will explore a fix for this based on the second option above to see how feasible this is.

BlurrechDev avatar Jan 08 '23 18:01 BlurrechDev

Great @BlurrechDev !

If that becomes unwieldy (i.e. we're passing a huge struct along), then we can reassess.

Feel free to post half-complete code — either for feedback or to merge something initial.

max-sixty avatar Jan 08 '23 19:01 max-sixty

Great. If you need some pointers, this is how I'd do it:

  • the SQL is generated here, so before that, I'd replace all s-strings in RQ AST with generated s-strings.
  • for that, I'd implement rq::RqFold, similar to CidRedirector.
  • generated s-strings should be something that will never appear in actual queries. Max's suggestion of $s_string_1, $s_string_2, $s_string_3, ... is ok.
  • the s-string-extractor must return the new AST and something like a Vec<(String, String)> (vec of pairs of generated and actual s-strings).
  • after SQL is generated and formatted, replace the generated placeholders with actual s-strings using basic string replacement.

aljazerzen avatar Jan 09 '23 09:01 aljazerzen

This issue prevents folks from using the escape hatch, so bumping this to "Priority"...

max-sixty avatar May 09 '23 18:05 max-sixty

I've did a little work on this, but it's much harder than I anticipated.

My idea was to replace s-strings with some unique identifier before compiling to SQL. After SQL is formatted, we can replace the identifier back with s-strings.

Starting PRQL:

from my_table
select s"COUNT ( DISTINCT {my_col})"

AST:

 From: my_table
 Select:
   SString:
   - "COUNT ( DISTINCT "
   - my_col
   - ")"

SStrings extracted:

From: my_table
Select:
  SString:
  - '_anchor_1'
  - my_col
  - '_anchor_2'

Compiled to SQL:

SELECT '_anchor_1'my_col'_anchor_2' FROM my_table

Formatted:

SELECT
  '_anchor_1' my_col '_anchor_2'
FROM
  my_table

Inject SStrings back in:

SELECT
  COUNT ( DISTINCT  my_col )
FROM
  my_table

... which is pretty close to what we'd want. The spacing after COUNT and before DISTINCT was preserved, as intended. But because formatting adds spacing between anchors, there are spaces around my_col.

I'm not sure we want to merge this, as it feels like a workaround using hacky text manipulation.

aljazerzen avatar May 10 '23 08:05 aljazerzen

Is there anything to having the whole S-string as a variable?

So the expression to be formatted is:

SELECT 
- '_anchor_1'my_col'_anchor_2' 
+ $_s_string_
FROM my_table

...and then we replace the variable after the formatting? So the S-string is completely opaque to the formatter.


a workaround using hacky text manipulation.

This used to be all of the compiler! 😀

max-sixty avatar May 10 '23 09:05 max-sixty

That's a good idea, but a bit problematic because you have to translate my_col somehow. If can could do this separately, then that's the way to go.

aljazerzen avatar May 10 '23 11:05 aljazerzen

I was thinking of starting on this. But is it now intractable — everything is an s-string since the stdlib changes?

Or could we format the expressions that go into the s-strings separately? That seems quite difficult, if I'm thinking about it correctly.

max-sixty avatar Jun 10 '23 03:06 max-sixty

No, not really. The s-strings in std.sql.prql have a completely separate codepath and never land in the AST as s-strings.

So it still as tractable as it was before.

aljazerzen avatar Jun 10 '23 10:06 aljazerzen

Ah great! I hadn't realized that

max-sixty avatar Jun 10 '23 19:06 max-sixty

I did remember you just asking about this: https://github.com/PRQL/prql/issues/2694#issuecomment-1575693384

:D

aljazerzen avatar Jun 12 '23 06:06 aljazerzen

Sorry, that's bad memory even by my standards!

max-sixty avatar Jun 12 '23 22:06 max-sixty