sqlx icon indicating copy to clipboard operation
sqlx copied to clipboard

Add (and test) explicit support for PgBouncer

Open mehcode opened this issue 4 years ago • 12 comments

Notable differences

  • Prepared statements are invalid after a SYNC (so they can be used but may not be cached).

  • Apparently does not support the startup parameter IntervalStyle. That may need to be deferred to a SET after connect.

mehcode avatar Jan 16 '20 19:01 mehcode

How would PgConnection differentiate? Maybe a query parameter? ?pgbouncer=1 or something.

abonander avatar Jan 16 '20 19:01 abonander

@abonander My hope is we can identify that it's PgBouncer on start-up. If not we'll need to tinker and provide configuration for people to use. Perhaps a way to disable the prepared cache in the database url.

mehcode avatar Jan 16 '20 19:01 mehcode

Any news on this? I would love to be able to use pgbouncer for my SQLX apps

Firaenix avatar Sep 09 '21 02:09 Firaenix

We'd gladly accept a pull request that improves compatibility, but otherwise it's not currently a priority.

abonander avatar Sep 09 '21 02:09 abonander

You should be able to get support by turning off the prepared statement cache. Set the cache capacity to 0. See https://docs.rs/sqlx/0.5.5/sqlx/postgres/struct.PgConnectOptions.html#method.statement_cache_capacity

I haven't tried it but from the OP of this issue, it sounds like this should work.

mehcode avatar Sep 09 '21 16:09 mehcode

I can confirm that it does indeed work with the cache capacity set to 0.

Specifically, using PgBouncer's transaction pooling mode and statement cache capacity at 0, I was able to use sqlx with AWS Lambda to begin and commit transactions.

TuhinNair avatar Feb 01 '22 15:02 TuhinNair

I don't think setting the cache capacity to 0 works in all cases. If you run queries w/o any transaction and the cache size is 0, SQLx will still prepare the statement if the query has any arguments:

https://github.com/launchbadge/sqlx/blob/32f1273565e1cd676e0514655fe8aec67b59b634/sqlx-core/src/postgres/connection/executor.rs#L209-L214

https://github.com/launchbadge/sqlx/blob/dd27aa05870e71c642116809d14773a7b06c354c/sqlx-core/src/postgres/connection/executor.rs#L177-L179

Since pgbouncer doesn't keep the "prepare" step and the execution together, this will lead to confusing error messages where prepared statements cannot be found.

I'm wondering if there is a way to pass arguments to a query w/o relying on "prepare statement" (and w/o having the user to do manual but safe string/parameter escaping).

crepererum avatar Feb 18 '22 09:02 crepererum

I don't really think that's something that SQLx should tackle. I think that could be built as a generic SQL interpolation library.

If you have access to PgBouncer's config, it looks like you can set the pooling mode to Session which should make everything work as expected. Kinda defeats the point of PgBouncer, I guess, but it would work.

abonander avatar Feb 18 '22 21:02 abonander

Hi,

I got the same issue with the error prepared statement sqlx_s_1 already exists while doing with pgbouncer. Setting capacity to 0 and persistent at query level to false did not help. I finally did a workaround with DEALLOCATE all; before each query execution.

uthng avatar Apr 06 '22 09:04 uthng

@uthng more discussion here: https://github.com/launchbadge/sqlx/pull/1740#issuecomment-1088045794

abonander avatar Apr 07 '22 18:04 abonander

Add my work-around here, albeit it will bottleneck on a single connection: let pool = PgPoolOptions::new().max_connections(1).connect(db_url) The pool above will work with pgbouncer.

digizeph avatar Aug 16 '22 00:08 digizeph

Another approach would be to implement proper prepared statement support in pgbouncer since sqlx isn't the only client suffering from this limitation, see https://github.com/pgbouncer/pgbouncer/issues/695

crepererum avatar Aug 16 '22 07:08 crepererum

Just wanted to mention that I've had that exact issue with AWS RDS Proxy (in front of RDS Postgres), and the workaround from https://github.com/launchbadge/sqlx/issues/67#issuecomment-1090027280 worked.

I couldn't find any info on wether they use pgbouncer under the hood or if it's just a similar problem.

abusch avatar Nov 17 '22 08:11 abusch

I have been looking into this problem and my conclusions are the following:

Using pgbouncer in pool_mode=session is fine (quite pointless, but fine). The problem is with pool_mode=transaction. The problem is twofold:

  1. we cannot use named statements
  2. preparing, binding, describing and executing statements needs to be done in one round-trip. The reasons behind those problems is that after a SYNC pgbouncer may assign us a different physical connection and statements are prepared per connection.

In the case of problem 1), if we are using named statements and our physical connection changes, the statement will no longer be there when we try to reuse it. Even worse, some other application will have it in their connection and they could get the famous prepared statement sqlx_s_1 already exists. This can be solved in different ways: a) implementing unnamed statement support in sqlx (not hard, I have a proof of concept) b) setting the statement cache size to 0 AND setting the option server_reset_query_always=1 in pgbouncer, so it removes any prepared named statements we create. If we don't want pgbouncer to do the cleanup, we can do it manually by executing DEALLOCATE all; before each query execution.

Problem 2) is much more tricky. Assuming that we solved problem 1) and we are no longer caching statements we still have the problem that currently sqlx does the following when executing queries:

1. Parameter type name -> OID resolution.  We only have the names of the type parameters, so we run a query to get their OIDs. Potentially does a full blown query.
2. PREPARE
3. DESCRIBE
4. (*) SYNC
5. (*) Parameter type OID -> extended type info (potentially does a full blown query, i.e. at least PREPARE SYNC BIND EXECUTE CLOSE_PORTAL SYNC)
6. (*) Return type OID -> extended type info (potentially does a full blown query, similar to the previous one)
7. BIND
8. EXECUTE
9. CLOSE_PORTAL
10. SYNC

The problem is that every time that we do a SYNC (see asterisks in previous list), potentially pgbouncer changes our connection and we lose our prepared statement (anonymous or not), making the binding and execution fail.

Note that we can send PREPARE DESCRIBE BIND EXECUTE SYNC together, in one round-trip, but that does not help, because we need the extended type info from steps 5 and 6 before we return the first row. Assuming that all the parameters are known before we prepare the statement, step 5 can be done together with step 1, but step 6 still requires the output from step 3, so that one cannot be moved. It's also not possible to defer the requests to get the extended type info until after we do EXECUTE SYNC, because we are streaming the rows and the first row already needs that metadata available.

Possible solutions that come to my mind: a) PREPARE twice. This is a dumb solution, but it works. Since steps 4, 5 and 6 may have destroyed our prepared statement, before step 7 we PREPARE it again. b) Cache locally the information from pg_catalog that we need and update this cache every time we run a query. We would do this just before step 1, and it would prevent steps 1, 5 and 6 from reaching out to the DB. This this cache would be shared by all the connections in the pool. The first query run would need to fetch all the catalog (basically execute fetch_type_by_oid for all OIDs.), but subsequent ones would only need to fetch new OIDs (OIDs greater than the latest we have in the cache), if any. I'm not sure if updating the cache is even necessary... the compile time checks were run against some types, so I'm not sure if changing them afterwards is safe. Which brings me to the another possibility.... c) Use the data from the compile time checks. All the metadata information required could be extracted during compilation and stored in the binary. We would then at runtime just check this data. d) Parse the query to know, before the PREPARE statement, the type info of the result columns. Fetch those during step 1, move step 5 also to step one. Send all commands in one round-trip.

I'm by far not an expert in this codebase, so I may have gotten things wrong. What do you thing @abonander ? In any case, I'm willing to help solve this.

vtselfa avatar Jul 25 '23 22:07 vtselfa

It seems that pgcat (a pgbouncer) alternative has just added support for prepared statements. See it here https://postgresml.org/blog/making-postgres-30-percent-faster-in-production If this is the case, no changes to sqlx would be needed.

vtselfa avatar Jul 26 '23 09:07 vtselfa

Hm so https://github.com/pgbouncer/pgbouncer/pull/845 in theory works. But on testing I encounterd that sqlx fails to query with encountered unexpected or invalid data: expecting ParseComplete but received ParameterDescription in some cases. (also mentioned as https://github.com/pgbouncer/pgbouncer/pull/845#discussion_r1320126353 )

Specifically, it is happening in this query:

#[instrument(skip(self, username))]
    async fn user_exists(&self, username: &str) -> bool {
        let exists = sqlx::query("SELECT 1 FROM users WHERE username = $1")
            .bind(username)
            .fetch_all(self.get_pool())
            .await;
        match exists {
            Ok(exists) => {
                debug!("[POSTGRES] [user_exists] {:?}", exists.len());
                exists.len() == 1
            }
            Err(e) => {
                if !matches!(e, sqlx::Error::RowNotFound) {
                    error!("[DB] Error checking if user exists: {}", e);
                }
                false
            }
        }
    }

I am not sure which side is buggy or not complete here but I thought it is worth mentioning so it isn't forgotten in the future. Since it matters it is tested with sqlx = { version = "0.7.1", features = ["postgres", "runtime-tokio-rustls"] } and commit 70ad45b7ec0d183caa65e15fef2e7b8ed6926957 of the PR I linked at the start.

Edit: This was a bug in the pgbouncer PR and is fixed now it seems :)

MTRNord avatar Sep 08 '23 17:09 MTRNord