haskell-opaleye
haskell-opaleye copied to clipboard
SQL query optimiser
Hi,
As mentioned, I've been working on a little query optimiser. The goal is to simplify the rendered query so it's easier for a human to read. You can see my transformations at https://github.com/ocharles/rel8/blob/sql-simplifier/Rel8/Internal/SqlTransformations.hs - there is nothing rel8
specific.
It performs the following transformations:
-
Redundant columns are removed. For example
SELECT a FROM (SELECT a, b FROM foo) q
will get rewritten toSELECT a FROM (SELECT a FROM foo) q
, asb
is not used for anything. -
Some parenthesis are removed in expressions. I was seeing a lot of
WHERE (x) = (y)
, whenWHERE x = y
would be just as valid. -
Selects are compressed. Doing
SELECT * FROM (SELECT a FROM foo)
is the same as justSELECT a FROM foo
. This step is a little smarter than just collapsingSELECT *
, but the idea is the same. -
Drop
OFFSET 0
.SELECT * FROM foo OFFSET 0
is the same asSELECT * FROM foo
.
Opening to discuss and see what your thoughts are. I opted to this as a transformation on the AST as it's easier for me to think up the rules here, rather than try and fiddle with the query generation itself.
Do you think some of these optimizations be applied during the actual AST construction? Maybe we can construct the AST so that some of these are already incorporated.
See the last comment:
I opted to this as a transformation on the AST as it's easier for me to think up the rules here, rather than try and fiddle with the query generation itself.
Personally I'd rather AST generation be quick and simple, and then tidy up in a separate pass. In the future I'd like to prepare these queries, so you'd only do that pass once anyway.
I don't have a burning desire to rewrite queries unless they are actually performance optimizations, but if others are very keen for particular rewrites then I'm happy to have them.
Having human readable queries is nice for logging / debugging in production, however, since this is just an AST pass maybe it could be a separate package? opaleye-pprint
?
Yea, @xldenis has pretty much mentioned my main motivators - logging and debugging. Currently opaleye
can emit a ton of SQL, and when you get the logic wrong it can be very hard to work out where. Currently I've been just dumping stuff into EXPLAIN ANALYZE
and looking at row counts/rows removed by filters, but this isn't particularly efficient.
This certainly could belong is a separate library, but at the moment opaleye
hard-wires the query generation and pretty printing. That's not the end of the world, if people are happy with a different select :: IO
function though.
I think it makes even less sense to try to debug a query that's not the one that's actually running!
Let's integrate these. They should be pretty well understood and we have a lot of test cases.
I'm not quite sure what you mean - obviously the query that I generate is the query I run. I guess you mean it's harder to debug something that doesn't have a 1:1 correspondence with the Haskell code though.
Oh, I thought you were suggesting that we only rewrite the queries when a human wants to read them.
I do, but I read them in the PostgreSQL query logs, so I have to run them :smile:
This should all really be done on PrimQuery'
. not Select
.
Just wondering, can the PG query optimizer itself give us the canonical query back? Or, can we bind to the internal PG library (written in C?) to do this task?
Good question. I have no idea.
I doubt Pg does much with this ast - most optimisations will be based on query nodes. But I don't know the actual answer, but there's nothing in libpq that looks like it would do this
On Sun, 5 Mar 2017, 9:10 am tomjaguarpaw, [email protected] wrote:
Good question. I have no idea.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/273#issuecomment-284215249, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjnNqFSvGDzxCOm-jJm4uwRxVxOMyks5rinwagaJpZM4MP9YH .
@ocharles
We are having some speed issues with Opaleye generated queries. Do you think your work here can help with that issue?
Is there any way we can try your optimisations with the Opaleye query we have with us right now?
I doubt it, i optimise for readability, performance shouldn't change
On Wed, 29 Mar 2017, 2:48 am Sandeep.C.R, [email protected] wrote:
@ocharles https://github.com/ocharles
We are having some speed issues with Opaleye generated queries https://github.com/tomjaguarpaw/haskell-opaleye/issues/284. Do you think your work here can help with that issue?
Is there any way we can try your optimisations with the Opaleye query we have with us right now?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/273#issuecomment-289957473, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjt6_1u_vYvkCYNTnwsNUczBZSa6Oks5rqbhggaJpZM4MP9YH .
@ocharles Actually I think there is a lot of hope that the kinds of transformations you perform will help. If you look at the the referenced issue you will see that most of the time is spent in the plan/analyze phase. The generated queries are very large and I suspect anything that improves readability will also help improve plan/analyze speed. There's nothing in the queries (that have been privately shared with me) that is fundamentally different from the handwritten version, just a lot of noise.
Interesting! Then maybe using hasql and preparing the queries would be better (I've been looking into that with rel8)
On Wed, 29 Mar 2017, 8:49 am tomjaguarpaw, [email protected] wrote:
@ocharles https://github.com/ocharles Actually I think there is a lot of hope that the kinds of transformations you perform will help. If you look at the the referenced issue you will see that most of the time is spent in the plan/analyze phase. The generated queries are very large and I suspect anything that improves readability will also help improve plan/analyze speed. There's nothing in the queries (that have been privately shared with me) that is fundamentally different from the handwritten version, just a lot of noise.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/273#issuecomment-290011193, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjuc2B0cn0l6iU-AcG-orHawZoC90ks5rqg0AgaJpZM4MP9YH .