prql icon indicating copy to clipboard operation
prql copied to clipboard

An implementation of a similar idea

Open gregsexton opened this issue 3 years ago • 4 comments

This is an exciting project to stumble across. I had very similar frustrations with SQL last year so I hacked on a prototype until I was happy. My implementation looks a lot like your spec. There are differences - certainly in syntax - but the differences are shallow.

My project has been my daily SQL driver for at least half a year now and works well - so I can validate your proposal makes sense. Being able to iterate faster on queries has completely changed my workflow.

I just uploaded the project so you can take a look: https://github.com/gregsexton/gql

I don't have an agenda here. Just thought you might like the anecdotal project validation. Maybe my project can give you some inspiration (maybe of what you don't want!). Feel free to grab from the implementation if it ramps you up faster. It's written in lisp though, which might be a turn off.

Let me know if you want to bounce ideas around. I tried a few different things before settling on what I have.

gregsexton avatar Jan 26 '22 10:01 gregsexton

Looks nice! Would love to see fewer brackets but I get their usefulness. Did you ever implement a syntax for recursive CTEs?

drittich avatar Jan 26 '22 18:01 drittich

@gregsexton it's amazing how similar it is!

Is the whole implementation in https://github.com/gregsexton/gql/blob/master/src/sqlgen/core.clj? I'm impressed at how terse it is if so.

If you have any words of wisdom for PRQL, they'd be very welcome...

max-sixty avatar Jan 27 '22 04:01 max-sixty

Is the whole implementation in https://github.com/gregsexton/gql/blob/master/src/sqlgen/core.clj?

Yes it is; my choice of lisp was tactical - it makes it super quick and easy to throw together DSLs like this. I don't need a parser and I was already aware of this excellent library that takes a data structure and outputs SQL, so I didn't have to write SQL rendering logic either. core.clj is responsible for the middle bit: taking an AST and manipulating that until it's in a form I'm happy to render. In other words, I only have to define the semantics of the language I want.

It's also worth pointing out that this isn't complete. There are SQL queries you can't currently express. I just keep adding to it as I hit these. Ignoring all the functions I haven't yet defined, I'd say this covers ~80-90% of SQL semantics.

Would love to see fewer brackets but I get their usefulness.

This is why I don't need a parser. It also makes editing really convenient, if your editor has support for structured editing. Swapping expressions around, removing them entirely, etc is a key stroke away.

But I didn't come here to sell lisp :D

Did you ever implement a syntax for recursive CTEs?

Ah, you mean to be able to do hierarchical queries? No! This would be very interesting to consider. The query engines I most often work with don't support this so I haven't given it much thought. How are you thinking about implementing this?

If you have any words of wisdom for PRQL

I can share top of mind what has worked really well and what I'm less happy with.

I've been very pleased with how window functions came out. I took a lot of inspiration from dplyr here. You do a group and then within that you define the operation you want to apply to the group: either summarize to collapse - aggregate - or mutate to add a column operating on the group - window. You can 'order the group' and that translates in to the window over clause getting an order by.

I emit SQL with CTEs to make it readable. This added some complexity but was a change that turned out to be valuable to me. By allowing users to incrementally build up queries you naturally end up with quite a lot of nesting - even taking care to only nest when it is necessary. I found that it's been really valuable to be able to work with the SQL that's generated. e.g. to share with others, manipulate from within tooling, etc.

Summarize, mutate, select. For me these do serve different purposes, but all they're really doing under the hood is manipulating the column selection. I think there's opportunity to simplify the language here and reduce the number of concepts.

I re-defined functions in my language rather than giving the user some sort of escape to specify that X is a function. e.g. you write (max a b) rather than (funcall max a b) or something like that. It's been annoying adding functions as I've discovered I've needed them, so maybe an esacpe would be nice and it would definitely scale better. But it allows the language to understand the semantics of functions, which in turn can simplify usage and potentially enrich them. A very simple example: you write (count) rather than (count *).

I'll stop there for now. Any tradeoffs you're wrestling with?

gregsexton avatar Jan 27 '22 11:01 gregsexton

One other thing I should definitely mention is that my language is really an embedded DSL. So I get a full programming language too around it. This isn't often useful - SQL is pretty capable - but being able to programatically build up queries can be very helpful. It gives you a powerful macro ability. e.g. if I want to generate queries for lots of different filtering combinations, maybe I need a query per graph on a dashboard, I can map over a data structure that I build up and have the language output many queries. I don't tend to overuse this but for simple examples like this one it can save time.

gregsexton avatar Jan 27 '22 11:01 gregsexton

Hi @gregsexton — I'm going over old issues, and was happy to come across this.

Interestingly, since the beginning of the year, PRQL has become even closer to GQL. For example, re:

I've been very pleased with how window functions came out. I took a lot of inspiration from dplyr here. You do a group and then within that you define the operation you want to apply to the group: either summarize to collapse - aggregate - or mutate to add a column operating on the group - window. You can 'order the group' and that translates in to the window over clause getting an order by.

...we changed our group-by approach from an argument — aggregate by:country [sum salary], to a pipeline — group [country] (aggregate [sum salary]), so the aggregate behaves the same in or outside a group.

I appreciate your message above, thanks for sharing. (And forgive me for not responding, I did engage with it at the time, FWIW)

If you ever have more guidance for PRQL, or ways of us collaborating, always feel free to open an issue / drop me a message / come say Hi if you drop by LA!

max-sixty avatar Oct 15 '22 04:10 max-sixty