sqlx Program stalls when async streaming query results.

Bug Description

Program stalls when trying to async stream query results using SQLx.

Minimal Reproduction

tracing::trace!("Querying SQL server...");

let mut rows = sqlx::query!(
    "SELECT `id`, `name`, `group`
    FROM `users`
    WHERE `group`=?",
    group
).fetch(&self.db_pool);

tracing::trace!("Streaming results...");

while let Some(row) = rows.try_next().await? {
    tracing::trace!("I read a row!");
}

After many repeated calls to the above snippet/function, the program will stall at Streaming results... I was being pretty abusive, calling hundreds (thousands?) of times consecutively...

If I change from the fetch macro to the fetch_all macro and adjusting the code to try to read from the resultingVec, my program will stall at Querying SQL server... without ever returning a Vec.

When stalled, the core the SQLx thread is on is pinned. 25% total CPU (on a 4-core laptop) by my program. It doesn't seem like a deadlock?

Info

SQLx version: 0.7
SQLx features enabled: "chrono", "macros", "mysql", "runtime-tokio-native-tls", "rust_decimal"
Database server and version: MySQL Win64 8.0.35
Operating system: Pop OS Linux
rustc --version: cargo 1.75.0-nightly (6fa6fdc76 2023-10-10)

Oct 29 '23 03:10 leontoeides

Just following up. For me, SQLx also stalls on:

let mut transaction = self.db_pool.begin().await?;

I'm running single-threaded with no other transactions, but I'm slamming it pretty hard - maybe too many previous transactions are not yet completed? It's a long story - I'm still testing my code. In this case, I was inadvertantly just opening transactions and committing them. I wasn't inserting anything in the transaction.

I also tried setting my futures version to 0.3.19 and my tokio version to 1 - the same as sqlx, and that didn't help.

I also tried SQLx version 0.6 and it seems to have the same problem.

Of course, I could also be doing something incorrectly on my end - but I can't think of what it could be... I'm not sure if you've heard of anything like this before?

Oct 29 '23 16:10 leontoeides

Have you checked your database logs? Maybe some db cleanup blocking your execution.
Have you waited some more time to check if it eventually returns results?

Oct 30 '23 13:10 lcmgh

Sorry for the delay. I checked into this, there is no other activity apart from my queries. I've also spent some time migrating from MySQL to MariaDB just in case it was an issue with MySQL. The migration didn't help.

I'm hammering the server with SELECT requests (which is arguably improper). Eventually SQLX hangs when attempting to stream one of the query's results. This is repeatable, it happens every time < 20 seconds into execution.

Could be related to an exhausted connection pool?

Nov 27 '23 02:11 leontoeides

An additional observation: I can insert 29 records without issue but the 30th record stalls. If I ctrl+c my program, it will load 29 these records from the table, insert an additional 29 records, and stall on the 30th record. i.e. the 30th insert always fails.

Also note that I have this issue with both MySQL and a fresh install of MariaDB. I would suppose the issue is the client and not the server.

My default connection pool options look like this:

PoolOptions {
    max_connections: 10,
    min_connections: 0,
    connect_timeout: 30s,
    max_lifetime: Some(
        1800s,
    ),
    idle_timeout: Some(
        600s,
    ),
    test_before_acquire: true,
}

Jan 03 '24 22:01 leontoeides

Facing the same issue on a single connection. When fetching a large dataset 50_000+ it starts blocking. It also affects other Tokio processes like Axum not being able to handle requests anymore until the query completes.

Jan 29 '24 20:01 adhesivee

Might this be due to a table lock that is applied during fetching?

Jan 30 '24 14:01 lcmgh

Might this be due to a table lock that is applied during fetching?

Not in my case - there are no table locks, single threaded execution, no other SQL clients being served... I'm just hammering the SQL server with simple SELECT and INSERT statements

Feb 08 '24 03:02 leontoeides

A clippy lint significant_drop_in_scrutinee helped me resolve the issue.

I found that a match on an Option enclosing a MutexGuard was causing a deadlock. I can't explain why the deadlock took effect on my SQLX queries rather than at the site of the problem code. Maybe my program just happened to deadlock while waiting for a future that takes a relatively long time to complete, such as a SQL query? Or it happened to stall when there were too many futures awaiting?

Either way I'm marking this closed.

For others that might be having a similar issue, and would like these additional lints, I enabled them by putting the following snippet in my lib.rs. Although significant_drop_in_scrutinee is considered nursery, it ended up saving the day for me

#![warn(clippy::all, clippy::pedantic, clippy::nursery, clippy::cargo)]

Feb 24 '24 23:02 leontoeides