Postgres: force generic plan for better nullability inference.
Currently when using the query! and query_as! macro's sqlx uses explain (verbose, format json) execute {statement_id} to get the query plan of a prepared statement/query. Then looks through the plans and tries to be more accurate with the type inference.
If a prepared statement has query parameters it needs to fill then in with a valid argument and uses NULL for every one because it is always a valid argument.
This causes the query planner to give a query plan back that is optimized when the function arguments are all NULL, this is not always correct.
This pr forces postgres to use a generic execution plan when describing a prepared statement with function arguments. That way it won't make optimizations based on the given parameters.
Fixes #3539, #1265, https://github.com/launchbadge/sqlx/issues/2796#issuecomment-1923456127, #3408, #2127
I think providing a constant value may end up with the same problem in different queries.
I'm wondering if at a certain point, it would be easier to parse the SQL, just to get an idea of its structure. We'd still want to use the database for type analysis though.
I think providing a constant value may end up with the same problem in different queries.
I'm wondering if at a certain point, it would be easier to parse the SQL, just to get an idea of its structure. We'd still want to use the database for type analysis though.
I just found out about force_generic_plan which gets the same result without providing a random value for a type.
That sounds like it has potential, yeah. Reducing the planning overhead might help with compile times a bit too.
Make sure to do SET LOCAL so it doesn't persist for the lifetime of the connection (you'll need to do it in a transaction). describe() can be called at runtime too.
Technically breaking since it can change the output of analysis.
Also kind of why I'm starting to think maybe we should parse the SQL. The output of EXPLAIN isn't meant to be stable at all.
I think that's going to make the macro's quite a bit slower though
It'd be on-par with parsing the EXPLAIN output, and one less round-trip to the server (which is going to be where most of the overhead is anyway).
It'd be on-par with parsing the
EXPLAINoutput, and one less round-trip to the server (which is going to be where most of the overhead is anyway).
I had some time on my hands the last few weeks so I thought I'd try and experiment a bit with parsing the sql myself. Currently sqlx needs to:
- Prepare the query.
- Fetch the
attnotnulloption. - Get the query plan from the database (for Postgres).
Which is 3 queries per macro invocation. Step 2 and 3 combined take roughly between 750μs-3ms from what i've tested (Postgres on localhost).
With my implementation:
- Prepare the query.
- Fetch the database tables and cols (1 query that can be cached.)
- Parse and go through the query.
So this means 1 query for every macro invocation and 1 query once for all the macro's (per database). My implementation takes between 10μs-250μs in debug mode (depending on how big the query is).
So it's quite a bit faster and this also gives a lot more control over nullability which make the macro's more correct. My implementation is however not complete and far from ready, it doesn't support schema's also the integration in sqlx with query parameters is difficult but it's enough to experiment a bit.
so it's definitely feasible but a lot of work to get right.
So what's the current state of this PR? Is this going in the right direction or do we need a new PR implementing the SQL parsing idea?
My comment is more a proof of concept than anything serious. It's far from ready and a big change. There is also #1598 which would help with the compile time problem.
As for the PR, I'm not sure. It's the least costly way to solve this problem but not exactly pretty.
I'm leaning more and more towards parsing the query, because I've been trying to debug the reports in #3546 and the query plan parsing, for SQLite especially, is just completely unapproachable if you don't already understand what it's doing. I've spent all day on it just to end up with a guess of what's going on.
Parsing could also potentially reopen #1491, since it's a lot faster. Also, things like conditional ORDER BY, LIMIT, OFFSET don't affect nullability, so they could even be cached.
Yeah, that's about 3 big projects down on the to-do list.
I plan to put out a roadmap alongside the 0.9 release so people have more of an idea of where the high-level direction is going.
That would be great, I'm happy to help where I can.