sqlx icon indicating copy to clipboard operation
sqlx copied to clipboard

PoolTimeOut error happens frequently

Open option-greek opened this issue 3 years ago • 10 comments

Bug Description

PoolTimeOut error happens frequently when the server is idle. If I don't give a gap between requests, the timeout doesn't happen. My guess is when there is a timeout, the pool/its implementation doesn't detect it. Note: I know it's not a DB issue because connecting directly works (and also works after the pool has timed out). Worst part is if I make a second request, it still waits till the first one times out.

Steps:

  1. Create a pool and make it do a query when accessed through a tokio thread (could be any web server framework. I'm using warp)
  2. Make the first request. It succeeds
  3. Keep the server idle for some time (I kept for 1 hour - happens below that too)
  4. Make the second request: It times out with pool timeout. While its hanging trying to timeout, any number of parallel requests will also hang.
  5. Make the 3rd request after the PoolTimeOut error. It will succeed again.

Minimal Reproduction

This is how I'm initializing the pool.


   match PgPoolOptions::new()
            .max_connections(10)
            .min_connections(3)
            .connect(&prefs.pg_conn_string.clone()).await
        {
            Ok(pool) =>
                {
                    println!("Successfully inited the conn pool for pg");
                    return Option::from(pool);
                }

            Err(err) => {
                println!("Not able to create the conn pool for pg {:?}", err);
            }
        }

Info

  • SQLx version: 0.6.1
  • SQLx features enabled: "runtime-tokio-rustls", "postgres", "decimal", "chrono" , "tls", "uuid"
  • Database server and version: PostgreSQL 14.1
  • Operating system: Ubuntu 22.04
  • rustc --version: rustc 1.62.1 (e092d0b6b 2022-07-16)

option-greek avatar Jan 05 '23 08:01 option-greek

If anyone knows how to use deadpool with sqlx, please provide a sample. The previous example provided here no longer compiles.

option-greek avatar Jan 05 '23 12:01 option-greek

Workaround: The problem doesn't happen with max life time of 2 min and idle time out of 1 min. It works reliably with these settings.

option-greek avatar Jan 05 '23 17:01 option-greek

I have the same problem. Is that okay?

.idle_timeout(Some(Duration::from_secs(60))) .max_lifetime(Some(Duration::from_secs(120)))

@option-greek

sugerdudu avatar Aug 10 '23 22:08 sugerdudu

I am currently facing this same issue, I sql queries will randomly error out with PoolTimedOut, and only rarely there would be no error.

JimitSoni18 avatar Dec 30 '23 16:12 JimitSoni18

Is anyone having this issue still? how can we make progress here?

rcmgleite avatar Oct 30 '24 00:10 rcmgleite

If anyone knows how to use deadpool with sqlx, please provide a sample. The previous example provided here no longer compiles.

Here is a code snippet I wrote today: (Note that I need a Transaction wrapper with static lifetime so I make a wrapper type for Txn and duplicate some code, @abonander I wonder if upstream's Transaction can optionally accept owned AsRef<Connection> + 'c so that we don't need wrap it externally anyway).

use std::ops::Deref;
use std::ops::DerefMut;
use std::time::Duration;

use deadpool::managed::Metrics;
use deadpool::managed::Object;
use deadpool::managed::Pool;
use deadpool::managed::PoolError;
use deadpool::managed::RecycleResult;
use deadpool::managed::Timeouts;
use futures::future::BoxFuture;
use sqlx::Acquire;
use sqlx::ConnectOptions;
use sqlx::Connection;
use sqlx::PgConnection;
use sqlx::TransactionManager;
use sqlx::postgres::PgConnectOptions;

#[derive(Debug, Clone)]
pub struct DBPool {
    pool: Pool<Manager>,
}

impl DBPool {
    pub fn new(option: PgConnectOptions, max_size: usize) -> Self {
        let pool = Pool::builder(Manager { option })
            .max_size(max_size)
            .timeouts(Timeouts {
                wait: Some(Duration::from_secs(30)),
                create: Some(Duration::from_secs(30)),
                recycle: Some(Duration::from_secs(30)),
            })
            .runtime(deadpool::Runtime::Tokio1)
            .build()
            .unwrap();
        Self { pool }
    }

    pub async fn acquire(&self) -> Result<Object<Manager>, PoolError<sqlx::Error>> {
        self.pool.get().await
    }

    pub async fn begin(&self) -> Result<PostgresTransaction, PoolError<sqlx::Error>> {
        let mut conn = self.acquire().await?;
        <sqlx::Postgres as sqlx::Database>::TransactionManager::begin(&mut conn).await?;
        Ok(PostgresTransaction { conn, open: true })
    }

    pub(crate) fn maintain(&self) {
        const IDLE_TIMEOUT: Duration = Duration::from_secs(10 * 60);
        self.pool.retain(|_, m| m.last_used() < IDLE_TIMEOUT)
    }

    pub(crate) fn clear(&self) {
        self.pool.retain(|_, _| false)
    }
}

#[derive(Debug)]
pub struct Manager {
    option: PgConnectOptions,
}

impl deadpool::managed::Manager for Manager {
    type Type = PgConnection;
    type Error = sqlx::Error;

    async fn create(&self) -> Result<Self::Type, Self::Error> {
        self.option.connect().await
    }

    async fn recycle(&self, conn: &mut Self::Type, _: &Metrics) -> RecycleResult<Self::Error> {
        Ok(conn.ping().await?)
    }
}

#[derive(Debug)]
pub struct PostgresTransaction {
    conn: Object<Manager>,
    open: bool,
}

impl Drop for PostgresTransaction {
    fn drop(&mut self) {
        if self.open {
            // starts a rollback operation

            // what this does depend on the database but generally this means we queue a rollback
            // operation that will happen on the next asynchronous invocation of the underlying
            // connection (including if the connection is returned to a pool)

            <sqlx::Postgres as sqlx::Database>::TransactionManager::start_rollback(&mut self.conn);
        }
    }
}

impl Deref for PostgresTransaction {
    type Target = PgConnection;

    #[inline]
    fn deref(&self) -> &Self::Target {
        &self.conn
    }
}

impl DerefMut for PostgresTransaction {
    #[inline]
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.conn
    }
}

// Implement `AsMut<DB::Connection>` so `Transaction` can be given to a
// `PgAdvisoryLockGuard`.
//
// See: https://github.com/launchbadge/sqlx/issues/2520
impl AsMut<PgConnection> for PostgresTransaction {
    fn as_mut(&mut self) -> &mut PgConnection {
        &mut self.conn
    }
}

impl<'t> Acquire<'t> for &'t mut PostgresTransaction {
    type Database = sqlx::Postgres;

    type Connection = &'t mut PgConnection;

    #[inline]
    fn acquire(self) -> BoxFuture<'t, Result<Self::Connection, sqlx::Error>> {
        Box::pin(futures::future::ok(&mut **self))
    }

    #[inline]
    fn begin(self) -> BoxFuture<'t, Result<sqlx::Transaction<'t, sqlx::Postgres>, sqlx::Error>> {
        sqlx::Transaction::begin(&mut **self)
    }
}

impl PostgresTransaction {
    /// Commits this transaction or savepoint.
    pub async fn commit(mut self) -> Result<(), sqlx::Error> {
        <sqlx::Postgres as sqlx::Database>::TransactionManager::commit(&mut self.conn).await?;
        self.open = false;
        Ok(())
    }

    /// Aborts this transaction or savepoint.
    #[allow(dead_code)]
    pub async fn rollback(mut self) -> Result<(), sqlx::Error> {
        <sqlx::Postgres as sqlx::Database>::TransactionManager::rollback(&mut self.conn).await?;
        self.open = false;
        Ok(())
    }
}

tisonkun avatar Jan 17 '25 09:01 tisonkun

I've made an example at https://github.com/fast/fastpool/blob/93eb67c/examples/postgres/src/main.rs, where I'm considering hooks and timeouts should be handled in the manager or pool side directly.

tisonkun avatar Feb 27 '25 07:02 tisonkun

Yeah, this makes sqlx::Pool basically unusable. Every 37 or so minutes my server crashes when the automated health check times out waiting for a connection. Doing everything by the book, manually dropping conns right after use, tried every suggestion listed here WRT pool size and timeout. Is there any work on a fix for this?

rektdeckard avatar May 26 '25 04:05 rektdeckard

There isn't one single fix in the works because the error doesn't have one and only one single cause. If it was caused by a single obviously fixable issue, it'd be addressed ASAP.

Inherently, the error happens because we chose to have acquire() (and everything built on it) always time out rather than wait forever if a connection never becomes available. It's relatively quick feedback that there's some sort of problem, but there are multiple reasons this can happen.

  • the backlog of acquire() calls has grown so large that they literally can't all be served before they start timing out;
  • something is effectively leaking connections from the pool so they can't be reused;
  • new connection attempts keep failing;
  • all the connections are being hogged by slow queries;

And that's just what I can come up with off the top of my head. The exact situation depends entirely on how you're using it. Observability and logging should help to determine what's going on.

I'm not about to pretend that the pool is bug-free, but if timeouts are due to a bug, we can't really do anything without a way to reliably reproduce it.

abonander avatar May 26 '25 09:05 abonander

@abonander understood, and it was actually my mistake – had a contention issue which was causing threads to get stuck waiting for each other. Thanks for the note.

rektdeckard avatar May 26 '25 22:05 rektdeckard