sqlx
sqlx copied to clipboard
Tests fail randomly with "PoolTimedOut"
Bug Description
I'm writing an API service using actix-web and sqlx. Integration tests use #[sqlx::test]
with and without migrations. If I'm running more than around 20 tests at least one of them, but usually two or three, fail with failed to connect test pool: PoolTimedOut
. Running less tests at a time does not show this issue. I'm including the backtrace at the bottom.
Info
- SQLx version: 0.6.3
- SQLx features enabled: ["chrono", "macros", "migrate", "offline", "postgres", "runtime-actix-rustls", "uuid"]
- Database server and version: PostgreSQL 14
- Operating system: Pop!_OS 22.04 LTS x86_64
-
rustc --version
: rustc 1.70.0 (90c541806 2023-05-31)
---- some_endpoint::returns_error_when_time_selection_is_invalid stdout ----
thread 'some_endpoint::returns_error_when_data_is_invalid' panicked at 'failed to connect test pool: PoolTimedOut', /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:186:14
stack backtrace:
0: rust_begin_unwind
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
1: core::panicking::panic_fmt
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
2: core::result::unwrap_failed
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
3: core::result::Result<T,E>::expect
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1046:23
4: sqlx_core::testing::run_test_with_pool::{{closure}}::{{closure}}
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:183:20
5: sqlx_core::testing::run_test::{{closure}}
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:218:77
6: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/future/future.rs:125:9
7: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:541:57
8: tokio::runtime::coop::with_budget
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/coop.rs:107:5
9: tokio::runtime::coop::budget
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/coop.rs:73:5
10: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:541:25
11: tokio::runtime::scheduler::current_thread::Context::enter
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:350:19
12: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:540:36
13: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:615:57
14: tokio::macros::scoped_tls::ScopedKey<T>::set
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/macros/scoped_tls.rs:61:9
15: tokio::runtime::scheduler::current_thread::CoreGuard::enter
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:615:27
16: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:530:19
17: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:154:24
18: tokio::runtime::runtime::Runtime::block_on
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/runtime.rs:302:47
19: sqlx_rt::rt_tokio::test_block_on
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-rt-0.6.3/src/rt_tokio.rs:42:5
20: sqlx_core::testing::run_test
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:211:5
21: sqlx_core::testing::run_test_with_pool
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:182:5
22: <fn(sqlx_core::pool::Pool<DB>) .> Fut as sqlx_core::testing::TestFn>::run_test
at /home/jan/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sqlx-core-0.6.3/src/testing/mod.rs:85:9
23: api::some_endpoint::returns_error_when_data_is_invalid
at ./tests/api/some_endpoint.rs:249:1
24: api::some_endpoint::returns_error_when_data_is_invalid::{{closure}}
at ./tests/api/some_endpoint.rs:250:75
25: core::ops::function::FnOnce::call_once
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
26: core::ops::function::FnOnce::call_once
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
What is the max_connections setting on your Postgres Pool? (PgPoolOptions)
I'm curious, if you increase the max connections for the pg pool, doesn't the issue go away? I assume it can't get a connection and times out. Is this what aquire_timeout is meant to help with?
Unfortunately, this does not seem to change anything. These are the settings used for PgPoolOptions
:
acquire_timeout: 10000
min_connections: 5
max_connections: 100
Timeout is in millieseconds, and the PostgreSQL database also has max connections of 100, which should be enough for 27 tests.
The problem seems to persist in version 0.7.0
I was able to consistently reproduce the PoolTimedOut
error on mySQL. I opened a new issue because in my case the problem only occurs on mySQL and is not present when using the Postgres driver.
I'm experiencing a similar issue with sqlx 0.7 (but not sqlx 0.6) and Postgres.
- My unit tests with
sqlx 0.6.x
work fine. - But when I upgrade to
0.7.0
or0.7.1
, ifmax_connections > 1
, I get thisPoolTimedOut
error randomly in some of my unit tests.
Reverting to sqlx 0.6.x
resolves the issue.
I hope this can be fixed, as the upgrade otherwise was pretty seamless.
Here is what I tried:
- Setting PostgreSQL connection to 500.
- Setting
PgPoolOptions
max_connections
to 2, 5, 10, 300. All these settings caused issues. - Running
cargo test -- --test-threads=1
(I still encountered the issue, all tests are annotated with#[serial]
as well). - Printing
before_acquire
andafter_release
, things seemed normal.
Bumping this. Experiencing the same issues as @jeremychone
Reverting to 0.6 seems to be the best course of action for me at the moment.
I am encountering similar issue, program fails when running several test cases that concurrently acquire connection from a pool , if running only one or two test cases, the test cases can complete without the time-out error.
Here is the logging messages for sqlx::Pool
, looks like acquired connections do not return back to the pool for some reasons which cause the field Pool::size
decreasing . It might be similar to issues #622 and #2647 , not sure if that is the same root cause
[DEBUG] pool stats : Pool { size: 2, num_idle: 2, is_closed: false, options: PoolOptions {
max_connections: 3, min_connections: 0, connect_timeout: 20s, max_lifetime: Some(1800s),
idle_timeout: Some(45s), test_before_acquire: true } }
[DEBUG] pool stats : Pool { size: 2, num_idle: 1, is_closed: false, options: PoolOptions {
max_connections: 3, min_connections: 0, connect_timeout: 20s, max_lifetime: Some(1800s),
idle_timeout: Some(45s), test_before_acquire: true } }
[DEBUG] pool stats : Pool { size: 1, num_idle: 1, is_closed: false, options: PoolOptions {
max_connections: 3, min_connections: 0, connect_timeout: 20s, max_lifetime: Some(1800s),
idle_timeout: Some(45s), test_before_acquire: true } }
[ERROR] pool stats : Pool { size: 0, num_idle: 0, is_closed: false, options: PoolOptions {
max_connections: 3, min_connections: 0, connect_timeout: 20s, max_lifetime: Some(1800s),
idle_timeout: Some(45s), test_before_acquire: true } }
Development Environment
- SQLx version: 0.7.3
- SQLx features enabled:
[version="^0.7.3", features=["runtime-tokio", "mysql"]]
- Database server and version: MariaDB 10.3.22
- Operating system: Ubuntu LTS x86_64
-
rustc
--version: rustc 1.68.0
current workaround
Rollback to version 0.6.3 , then all test cases completes without the error
Rollback to version 0.6.3 , then all test cases completes without the error
I'm experiencing this issue as well, I've tried to roll back to 0.6.3 and I am still getting the same issue. Even when I force the tests to run on 1 thread, some of the tests seem to randomly fail. It seems that the pool is not dropping a connection when the test is complete.
Same issue here with latest sqlx
. Running against a local postgres, timeout on 10 seconds and a pool with 20 connections.
I have 3 unit tests and 1 of them (but random which one) fails with a PoolTimedOut
error after the timeout is up. Each test is a very simple insert or select.
#[cfg(test)]
mod tests {
#![allow(unused)]
use crate::_dev_utils;
use super::*;
use anyhow::Result;
use serial_test::serial;
#[serial]
#[tokio::test]
async fn test_create_ok() -> Result<()> {
....
}
#[serial]
#[tokio::test]
async fn test_get_err_not_found() -> Result<()> {
...
}
#[serial]
#[tokio::test]
async fn test_delete_err_not_found() -> Result<()> {
...
}
}
Reverting to sqlx 0.6 has fixed this problem completely.
I was able to work around this issue with the latest version (0.7.3) and since I spent many hours banging my head against my keyboard, I'd thought I'd share this.
So, it looks like the semaphore pool instantiates children pools that then delivers to each test. That semaphore/master pool seems to have 20 max_connections hardcoded. I saw that in the source for 0.6.3: https://github.com/launchbadge/sqlx/blob/v0.6.3/sqlx-core/src/postgres/testing/mod.rs#L82 although I can no longer find it for the latest version.
In any case, I had to create a new pool for every test using the connection options and ignoring the pool options
#[sqlx::test]
async fn should_accept_valid_api_keys(_: PgPoolOptions, options: PgConnectOptions) {
// Arrange
let db = PgPoolOptions::new().connect_with(options).await.unwrap();
let app = spawn_app(&db).await;
...
}
Credit: I actually found the solution in this blog post about integration tests in Rocket
any progress?
https://github.com/rust10x/rust-web-app/blob/main/crates/libs/lib-core/src/model/store/mod.rs says:
pub async fn new_db_pool() -> sqlx::Result<Db> {
// * See NOTE 1) below
let max_connections = if cfg!(test) { 1 } else { 5 };
PgPoolOptions::new()
.max_connections(max_connections)
.connect(&core_config().DB_URL)
.await
}
// NOTE 1) This is not an ideal situation; however, with sqlx 0.7.1, when executing 'cargo test', some tests that use sqlx fail at a
// rather low level (in the tokio scheduler). It appears to be a low-level thread/async issue, as removing/adding
// tests causes different tests to fail. The cause remains uncertain, but setting max_connections to 1 resolves the issue.
// The good news is that max_connections still function normally for a regular run.
// This issue is likely due to the unique requirements unit tests impose on their execution, and therefore,
// while not ideal, it should serve as an acceptable temporary solution.
// It's a very challenging issue to investigate and narrow down. The alternative would have been to stick with sqlx 0.6.x, which
// is potentially less ideal and might lead to confusion as to why we are maintaining the older version in this blueprint.