haskell-opaleye icon indicating copy to clipboard operation
haskell-opaleye copied to clipboard

SQL query optimiser

Open ocharles opened this issue 7 years ago • 17 comments

Hi,

As mentioned, I've been working on a little query optimiser. The goal is to simplify the rendered query so it's easier for a human to read. You can see my transformations at https://github.com/ocharles/rel8/blob/sql-simplifier/Rel8/Internal/SqlTransformations.hs - there is nothing rel8 specific.

It performs the following transformations:

  1. Redundant columns are removed. For example SELECT a FROM (SELECT a, b FROM foo) q will get rewritten to SELECT a FROM (SELECT a FROM foo) q, as b is not used for anything.

  2. Some parenthesis are removed in expressions. I was seeing a lot of WHERE (x) = (y), when WHERE x = y would be just as valid.

  3. Selects are compressed. Doing SELECT * FROM (SELECT a FROM foo) is the same as just SELECT a FROM foo. This step is a little smarter than just collapsing SELECT *, but the idea is the same.

  4. Drop OFFSET 0. SELECT * FROM foo OFFSET 0 is the same as SELECT * FROM foo.

Opening to discuss and see what your thoughts are. I opted to this as a transformation on the AST as it's easier for me to think up the rules here, rather than try and fiddle with the query generation itself.

ocharles avatar Mar 01 '17 17:03 ocharles

Do you think some of these optimizations be applied during the actual AST construction? Maybe we can construct the AST so that some of these are already incorporated.

xldenis avatar Mar 01 '17 17:03 xldenis

See the last comment:

I opted to this as a transformation on the AST as it's easier for me to think up the rules here, rather than try and fiddle with the query generation itself.

Personally I'd rather AST generation be quick and simple, and then tidy up in a separate pass. In the future I'd like to prepare these queries, so you'd only do that pass once anyway.

ocharles avatar Mar 01 '17 18:03 ocharles

I don't have a burning desire to rewrite queries unless they are actually performance optimizations, but if others are very keen for particular rewrites then I'm happy to have them.

tomjaguarpaw avatar Mar 01 '17 22:03 tomjaguarpaw

Having human readable queries is nice for logging / debugging in production, however, since this is just an AST pass maybe it could be a separate package? opaleye-pprint ?

xldenis avatar Mar 01 '17 22:03 xldenis

Yea, @xldenis has pretty much mentioned my main motivators - logging and debugging. Currently opaleye can emit a ton of SQL, and when you get the logic wrong it can be very hard to work out where. Currently I've been just dumping stuff into EXPLAIN ANALYZE and looking at row counts/rows removed by filters, but this isn't particularly efficient.

This certainly could belong is a separate library, but at the moment opaleye hard-wires the query generation and pretty printing. That's not the end of the world, if people are happy with a different select :: IO function though.

ocharles avatar Mar 01 '17 23:03 ocharles

I think it makes even less sense to try to debug a query that's not the one that's actually running!

Let's integrate these. They should be pretty well understood and we have a lot of test cases.

tomjaguarpaw avatar Mar 01 '17 23:03 tomjaguarpaw

I'm not quite sure what you mean - obviously the query that I generate is the query I run. I guess you mean it's harder to debug something that doesn't have a 1:1 correspondence with the Haskell code though.

ocharles avatar Mar 01 '17 23:03 ocharles

Oh, I thought you were suggesting that we only rewrite the queries when a human wants to read them.

tomjaguarpaw avatar Mar 01 '17 23:03 tomjaguarpaw

I do, but I read them in the PostgreSQL query logs, so I have to run them :smile:

ocharles avatar Mar 02 '17 10:03 ocharles

This should all really be done on PrimQuery'. not Select.

tomjaguarpaw avatar Mar 03 '17 10:03 tomjaguarpaw

Just wondering, can the PG query optimizer itself give us the canonical query back? Or, can we bind to the internal PG library (written in C?) to do this task?

saurabhnanda avatar Mar 05 '17 08:03 saurabhnanda

Good question. I have no idea.

tomjaguarpaw avatar Mar 05 '17 09:03 tomjaguarpaw

I doubt Pg does much with this ast - most optimisations will be based on query nodes. But I don't know the actual answer, but there's nothing in libpq that looks like it would do this

On Sun, 5 Mar 2017, 9:10 am tomjaguarpaw, [email protected] wrote:

Good question. I have no idea.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/273#issuecomment-284215249, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjnNqFSvGDzxCOm-jJm4uwRxVxOMyks5rinwagaJpZM4MP9YH .

ocharles avatar Mar 05 '17 10:03 ocharles

@ocharles

We are having some speed issues with Opaleye generated queries. Do you think your work here can help with that issue?

Is there any way we can try your optimisations with the Opaleye query we have with us right now?

sras avatar Mar 29 '17 01:03 sras

I doubt it, i optimise for readability, performance shouldn't change

On Wed, 29 Mar 2017, 2:48 am Sandeep.C.R, [email protected] wrote:

@ocharles https://github.com/ocharles

We are having some speed issues with Opaleye generated queries https://github.com/tomjaguarpaw/haskell-opaleye/issues/284. Do you think your work here can help with that issue?

Is there any way we can try your optimisations with the Opaleye query we have with us right now?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/273#issuecomment-289957473, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjt6_1u_vYvkCYNTnwsNUczBZSa6Oks5rqbhggaJpZM4MP9YH .

ocharles avatar Mar 29 '17 07:03 ocharles

@ocharles Actually I think there is a lot of hope that the kinds of transformations you perform will help. If you look at the the referenced issue you will see that most of the time is spent in the plan/analyze phase. The generated queries are very large and I suspect anything that improves readability will also help improve plan/analyze speed. There's nothing in the queries (that have been privately shared with me) that is fundamentally different from the handwritten version, just a lot of noise.

tomjaguarpaw avatar Mar 29 '17 07:03 tomjaguarpaw

Interesting! Then maybe using hasql and preparing the queries would be better (I've been looking into that with rel8)

On Wed, 29 Mar 2017, 8:49 am tomjaguarpaw, [email protected] wrote:

@ocharles https://github.com/ocharles Actually I think there is a lot of hope that the kinds of transformations you perform will help. If you look at the the referenced issue you will see that most of the time is spent in the plan/analyze phase. The generated queries are very large and I suspect anything that improves readability will also help improve plan/analyze speed. There's nothing in the queries (that have been privately shared with me) that is fundamentally different from the handwritten version, just a lot of noise.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/273#issuecomment-290011193, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjuc2B0cn0l6iU-AcG-orHawZoC90ks5rqg0AgaJpZM4MP9YH .

ocharles avatar Mar 29 '17 07:03 ocharles