sqlx icon indicating copy to clipboard operation
sqlx copied to clipboard

Postgres: force generic plan for better nullability inference.

Open joeydewaal opened this issue 1 year ago • 13 comments

Currently when using the query! and query_as! macro's sqlx uses explain (verbose, format json) execute {statement_id} to get the query plan of a prepared statement/query. Then looks through the plans and tries to be more accurate with the type inference.

If a prepared statement has query parameters it needs to fill then in with a valid argument and uses NULL for every one because it is always a valid argument.

This causes the query planner to give a query plan back that is optimized when the function arguments are all NULL, this is not always correct.

This pr forces postgres to use a generic execution plan when describing a prepared statement with function arguments. That way it won't make optimizations based on the given parameters.

Fixes #3539, #1265, https://github.com/launchbadge/sqlx/issues/2796#issuecomment-1923456127, #3408, #2127

joeydewaal avatar Oct 03 '24 22:10 joeydewaal

I think providing a constant value may end up with the same problem in different queries.

I'm wondering if at a certain point, it would be easier to parse the SQL, just to get an idea of its structure. We'd still want to use the database for type analysis though.

abonander avatar Oct 05 '24 18:10 abonander

I think providing a constant value may end up with the same problem in different queries.

I'm wondering if at a certain point, it would be easier to parse the SQL, just to get an idea of its structure. We'd still want to use the database for type analysis though.

I just found out about force_generic_plan which gets the same result without providing a random value for a type.

joeydewaal avatar Oct 05 '24 18:10 joeydewaal

That sounds like it has potential, yeah. Reducing the planning overhead might help with compile times a bit too.

Make sure to do SET LOCAL so it doesn't persist for the lifetime of the connection (you'll need to do it in a transaction). describe() can be called at runtime too.

abonander avatar Oct 05 '24 19:10 abonander

Technically breaking since it can change the output of analysis.

Also kind of why I'm starting to think maybe we should parse the SQL. The output of EXPLAIN isn't meant to be stable at all.

abonander avatar Oct 05 '24 20:10 abonander

I think that's going to make the macro's quite a bit slower though

joeydewaal avatar Oct 05 '24 20:10 joeydewaal

It'd be on-par with parsing the EXPLAIN output, and one less round-trip to the server (which is going to be where most of the overhead is anyway).

abonander avatar Oct 05 '24 21:10 abonander

It'd be on-par with parsing the EXPLAIN output, and one less round-trip to the server (which is going to be where most of the overhead is anyway).

I had some time on my hands the last few weeks so I thought I'd try and experiment a bit with parsing the sql myself. Currently sqlx needs to:

  1. Prepare the query.
  2. Fetch the attnotnull option.
  3. Get the query plan from the database (for Postgres).

Which is 3 queries per macro invocation. Step 2 and 3 combined take roughly between 750μs-3ms from what i've tested (Postgres on localhost).

With my implementation:

  1. Prepare the query.
  2. Fetch the database tables and cols (1 query that can be cached.)
  3. Parse and go through the query.

So this means 1 query for every macro invocation and 1 query once for all the macro's (per database). My implementation takes between 10μs-250μs in debug mode (depending on how big the query is).

So it's quite a bit faster and this also gives a lot more control over nullability which make the macro's more correct. My implementation is however not complete and far from ready, it doesn't support schema's also the integration in sqlx with query parameters is difficult but it's enough to experiment a bit.

so it's definitely feasible but a lot of work to get right.

Link to repo Implementation in sqlx

joeydewaal avatar Dec 23 '24 22:12 joeydewaal

So what's the current state of this PR? Is this going in the right direction or do we need a new PR implementing the SQL parsing idea?

musjj avatar Feb 14 '25 20:02 musjj

My comment is more a proof of concept than anything serious. It's far from ready and a big change. There is also #1598 which would help with the compile time problem.

As for the PR, I'm not sure. It's the least costly way to solve this problem but not exactly pretty.

joeydewaal avatar Feb 24 '25 10:02 joeydewaal

I'm leaning more and more towards parsing the query, because I've been trying to debug the reports in #3546 and the query plan parsing, for SQLite especially, is just completely unapproachable if you don't already understand what it's doing. I've spent all day on it just to end up with a guess of what's going on.

abonander avatar Mar 01 '25 01:03 abonander

Parsing could also potentially reopen #1491, since it's a lot faster. Also, things like conditional ORDER BY, LIMIT, OFFSET don't affect nullability, so they could even be cached.

joeydewaal avatar May 25 '25 20:05 joeydewaal

Yeah, that's about 3 big projects down on the to-do list.

I plan to put out a roadmap alongside the 0.9 release so people have more of an idea of where the high-level direction is going.

abonander avatar May 26 '25 03:05 abonander

That would be great, I'm happy to help where I can.

joeydewaal avatar May 26 '25 09:05 joeydewaal