sqlx icon indicating copy to clipboard operation
sqlx copied to clipboard

Tests fail randomly with "PoolTimedOut"

Open jandillmann opened this issue 1 year ago • 12 comments

Bug Description

I'm writing an API service using actix-web and sqlx. Integration tests use #[sqlx::test] with and without migrations. If I'm running more than around 20 tests at least one of them, but usually two or three, fail with failed to connect test pool: PoolTimedOut. Running less tests at a time does not show this issue. I'm including the backtrace at the bottom.

Info

  • SQLx version: 0.6.3
  • SQLx features enabled: ["chrono", "macros", "migrate", "offline", "postgres", "runtime-actix-rustls", "uuid"]
  • Database server and version: PostgreSQL 14
  • Operating system: Pop!_OS 22.04 LTS x86_64
  • rustc --version: rustc 1.70.0 (90c541806 2023-05-31)
---- some_endpoint::returns_error_when_time_selection_is_invalid stdout ----
thread 'some_endpoint::returns_error_when_data_is_invalid' panicked at 'failed to connect test pool: PoolTimedOut', /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:186:14
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
   3: core::result::Result<T,E>::expect
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1046:23
   4: sqlx_core::testing::run_test_with_pool::{{closure}}::{{closure}}
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:183:20
   5: sqlx_core::testing::run_test::{{closure}}
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:218:77
   6: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/future/future.rs:125:9
   7: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:541:57
   8: tokio::runtime::coop::with_budget
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/coop.rs:107:5
   9: tokio::runtime::coop::budget
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/coop.rs:73:5
  10: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:541:25
  11: tokio::runtime::scheduler::current_thread::Context::enter
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:350:19
  12: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:540:36
  13: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:615:57
  14: tokio::macros::scoped_tls::ScopedKey<T>::set
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/macros/scoped_tls.rs:61:9
  15: tokio::runtime::scheduler::current_thread::CoreGuard::enter
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:615:27
  16: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:530:19
  17: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:154:24
  18: tokio::runtime::runtime::Runtime::block_on
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/runtime.rs:302:47
  19: sqlx_rt::rt_tokio::test_block_on
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-rt-0.6.3/src/rt_tokio.rs:42:5
  20: sqlx_core::testing::run_test
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:211:5
  21: sqlx_core::testing::run_test_with_pool
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:182:5
  22: <fn(sqlx_core::pool::Pool<DB>) .> Fut as sqlx_core::testing::TestFn>::run_test
             at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:85:9
  23: api::some_endpoint::returns_error_when_data_is_invalid
             at ./tests/api/some_endpoint.rs:249:1
  24: api::some_endpoint::returns_error_when_data_is_invalid::{{closure}}
             at ./tests/api/some_endpoint.rs:250:75
  25: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
  26: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

jandillmann avatar Jun 27 '23 15:06 jandillmann

What is the max_connections setting on your Postgres Pool? (PgPoolOptions)

I'm curious, if you increase the max connections for the pg pool, doesn't the issue go away? I assume it can't get a connection and times out. Is this what aquire_timeout is meant to help with?

szuke avatar Jun 30 '23 14:06 szuke

Unfortunately, this does not seem to change anything. These are the settings used for PgPoolOptions:

    acquire_timeout: 10000
    min_connections: 5
    max_connections: 100

Timeout is in millieseconds, and the PostgreSQL database also has max connections of 100, which should be enough for 27 tests.

jandillmann avatar Jul 03 '23 07:07 jandillmann

The problem seems to persist in version 0.7.0

Xydez avatar Jul 07 '23 21:07 Xydez

I was able to consistently reproduce the PoolTimedOut error on mySQL. I opened a new issue because in my case the problem only occurs on mySQL and is not present when using the Postgres driver.

mdecimus avatar Jul 24 '23 08:07 mdecimus

I'm experiencing a similar issue with sqlx 0.7 (but not sqlx 0.6) and Postgres.

  • My unit tests with sqlx 0.6.x work fine.
  • But when I upgrade to 0.7.0 or 0.7.1, if max_connections > 1, I get this PoolTimedOut error randomly in some of my unit tests.

Reverting to sqlx 0.6.x resolves the issue.

I hope this can be fixed, as the upgrade otherwise was pretty seamless.

Here is what I tried:

  • Setting PostgreSQL connection to 500.
  • Setting PgPoolOptions max_connections to 2, 5, 10, 300. All these settings caused issues.
  • Running cargo test -- --test-threads=1 (I still encountered the issue, all tests are annotated with #[serial] as well).
  • Printing before_acquire and after_release, things seemed normal.

jeremychone avatar Jul 26 '23 17:07 jeremychone

Bumping this. Experiencing the same issues as @jeremychone

Reverting to 0.6 seems to be the best course of action for me at the moment.

bweis avatar Sep 28 '23 23:09 bweis

I am encountering similar issue, program fails when running several test cases that concurrently acquire connection from a pool , if running only one or two test cases, the test cases can complete without the time-out error.

Here is the logging messages for sqlx::Pool , looks like acquired connections do not return back to the pool for some reasons which cause the field Pool::size decreasing . It might be similar to issues #622 and #2647 , not sure if that is the same root cause

[DEBUG] pool stats : Pool { size: 2, num_idle: 2, is_closed: false, options: PoolOptions {
    max_connections: 3, min_connections: 0, connect_timeout: 20s, max_lifetime: Some(1800s),
    idle_timeout: Some(45s),  test_before_acquire: true } }

[DEBUG] pool stats : Pool { size: 2, num_idle: 1, is_closed: false, options: PoolOptions {
    max_connections: 3, min_connections: 0, connect_timeout: 20s, max_lifetime: Some(1800s),
   idle_timeout: Some(45s), test_before_acquire: true } }

[DEBUG] pool stats : Pool { size: 1, num_idle: 1, is_closed: false, options: PoolOptions {
    max_connections: 3, min_connections: 0, connect_timeout: 20s, max_lifetime: Some(1800s),
   idle_timeout: Some(45s), test_before_acquire: true } }

[ERROR] pool stats : Pool { size: 0, num_idle: 0, is_closed: false, options: PoolOptions {
    max_connections: 3, min_connections: 0, connect_timeout: 20s, max_lifetime: Some(1800s),
    idle_timeout: Some(45s),  test_before_acquire: true } }

Development Environment

  • SQLx version: 0.7.3
  • SQLx features enabled: [version="^0.7.3", features=["runtime-tokio", "mysql"]]
  • Database server and version: MariaDB 10.3.22
  • Operating system: Ubuntu LTS x86_64
  • rustc --version: rustc 1.68.0

current workaround

Rollback to version 0.6.3 , then all test cases completes without the error

metalalive avatar Jan 02 '24 06:01 metalalive

Rollback to version 0.6.3 , then all test cases completes without the error

I'm experiencing this issue as well, I've tried to roll back to 0.6.3 and I am still getting the same issue. Even when I force the tests to run on 1 thread, some of the tests seem to randomly fail. It seems that the pool is not dropping a connection when the test is complete.

LeviticusNelson avatar Jan 30 '24 17:01 LeviticusNelson

Same issue here with latest sqlx. Running against a local postgres, timeout on 10 seconds and a pool with 20 connections.

I have 3 unit tests and 1 of them (but random which one) fails with a PoolTimedOut error after the timeout is up. Each test is a very simple insert or select.

#[cfg(test)]
mod tests {
	#![allow(unused)]
	use crate::_dev_utils;

	use super::*;
	use anyhow::Result;
	use serial_test::serial;

	#[serial]
	#[tokio::test]
	async fn test_create_ok() -> Result<()> {
		....
	}

	#[serial]
	#[tokio::test]
	async fn test_get_err_not_found() -> Result<()> {
                ...
	}

	#[serial]
	#[tokio::test]
	async fn test_delete_err_not_found() -> Result<()> {
		...
	}
}

Reverting to sqlx 0.6 has fixed this problem completely.

LennDG avatar Feb 01 '24 14:02 LennDG

I was able to work around this issue with the latest version (0.7.3) and since I spent many hours banging my head against my keyboard, I'd thought I'd share this.

So, it looks like the semaphore pool instantiates children pools that then delivers to each test. That semaphore/master pool seems to have 20 max_connections hardcoded. I saw that in the source for 0.6.3: https://github.com/launchbadge/sqlx/blob/v0.6.3/sqlx-core/src/postgres/testing/mod.rs#L82 although I can no longer find it for the latest version.

In any case, I had to create a new pool for every test using the connection options and ignoring the pool options

#[sqlx::test]
async fn should_accept_valid_api_keys(_: PgPoolOptions, options: PgConnectOptions) {
    // Arrange
    let db = PgPoolOptions::new().connect_with(options).await.unwrap();

    let app = spawn_app(&db).await;

    ...
}

Credit: I actually found the solution in this blog post about integration tests in Rocket

aldofunes avatar Mar 20 '24 15:03 aldofunes

any progress?

plusls avatar Mar 26 '24 16:03 plusls

https://github.com/rust10x/rust-web-app/blob/main/crates/libs/lib-core/src/model/store/mod.rs says:

pub async fn new_db_pool() -> sqlx::Result<Db> {
	// * See NOTE 1) below
	let max_connections = if cfg!(test) { 1 } else { 5 };

	PgPoolOptions::new()
		.max_connections(max_connections)
		.connect(&core_config().DB_URL)
		.await
}

// NOTE 1) This is not an ideal situation; however, with sqlx 0.7.1, when executing 'cargo test', some tests that use sqlx fail at a
//         rather low level (in the tokio scheduler). It appears to be a low-level thread/async issue, as removing/adding
//         tests causes different tests to fail. The cause remains uncertain, but setting max_connections to 1 resolves the issue.
//         The good news is that max_connections still function normally for a regular run.
//         This issue is likely due to the unique requirements unit tests impose on their execution, and therefore,
//         while not ideal, it should serve as an acceptable temporary solution.
//         It's a very challenging issue to investigate and narrow down. The alternative would have been to stick with sqlx 0.6.x, which
//         is potentially less ideal and might lead to confusion as to why we are maintaining the older version in this blueprint.

thejozika avatar May 06 '24 19:05 thejozika