pgtemp icon indicating copy to clipboard operation
pgtemp copied to clipboard

Shutdown (without persist) hangs in Github actions

Open nihohit opened this issue 1 year ago • 21 comments

I can't repro this locally, so I don't know what the source of the issue is. The temporary DB, created with PgTempDB::async_new().await (so persist isn't set), hangs for minutes after a call to shutdown. Maybe there are some existing connections to the DB, but according to the documentation, this shouldn't block shutdown when persist isn't set.

This happens on a runner running Ubuntu noble, with postgresql-16 installed.

nihohit avatar Jul 09 '24 11:07 nihohit

Could you post a code sample? I briefly looked at the docs and it says "SIGKILL kills the postgres process without letting it relay the signal to its subprocesses, so it might be necessary to kill the individual subprocesses by hand as well." which I didn't know - maybe in my testing I only ever tried to kill it before doing work and the subprocesses were never spawned.

boustrophedon avatar Jul 09 '24 18:07 boustrophedon

This sample completes ok locally, but in my Github Actions it hangs indefinitely after printing "closing temp DB"

#[tokio::test]
async fn pgtemp_test() {
    use pgtemp::PgTempDB;
    use sqlx::postgres::PgPoolOptions;

    let temp_db = PgTempDB::async_new().await;
    println!("got  temp DB");
    let address = format!(
        "postgresql://{}:{}@localhost:{}",
        temp_db.db_user(),
        temp_db.db_pass(),
        temp_db.db_port()
    );

    let pool = PgPoolOptions::new()
        .connect(&address)
        .await
        .expect("Can't connect to postgres");

    let conn = pool.acquire().await.unwrap();
    drop(conn);

    println!("starting teardown");
    pool.close().await;
    println!("pool2: size: {} idle: {}", pool.size(), pool.num_idle());
    drop(pool);
    println!("closing temp DB");
    drop(temp_db);
    println!("teardown complete");
}

nihohit avatar Jul 10 '24 07:07 nihohit

Per #10 the issue is I was dumb with how I designed the shutdown function - probably what happened was that I wrote shutdown as fn shutdown(mut self) but then I wanted to use it inside drop, which is implemented with fn drop(&mut self) so I thought "oh I'll just change the signature of shutdown". However, it's obvious in hindsight that shutdown will then be called twice.

There are a lot of options here to fix this and pretty much all of them are breaking changes (which tbh isn't that big a deal since it's not like I have a ton of users). Probably simplest solution (and also doesn't change the API) is the one you provided but give me a bit to think about it.

boustrophedon avatar Jul 10 '24 21:07 boustrophedon

No hurry, but the PR is a separate issue - I see in github that shutdown never completes on the first call.

On Thu, 11 Jul 2024, 0:20 Harry Stern, @.***> wrote:

Per #10 https://github.com/boustrophedon/pgtemp/pull/10 the issue is I was dumb with how I designed the shutdown function - probably what happened was that I wrote shutdown as fn shutdown(mut self) but then I wanted to use it inside drop, which is implemented with fn drop(&mut self) so I thought "oh I'll just change the signature of shutdown". However, it's obvious in hindsight that shutdown will then be called twice.

There are a lot of options here to fix this and pretty much all of them are breaking changes (which tbh isn't that big a deal since it's not like I have a ton of users). Probably simplest solution (and also doesn't change the API) is the one you provided but give me a bit to think about it.

— Reply to this email directly, view it on GitHub https://github.com/boustrophedon/pgtemp/issues/9#issuecomment-2221503424, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDEBBRYVD3XG7ABXU4BBWLZLWQSJAVCNFSM6AAAAABKSVYIEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRGUYDGNBSGQ . You are receiving this because you authored the thread.Message ID: @.***>

nihohit avatar Jul 11 '24 04:07 nihohit

So the PR doesn't solve the hanging issue?

boustrophedon avatar Jul 11 '24 18:07 boustrophedon

nope, it solves a different issue.

nihohit avatar Jul 11 '24 20:07 nihohit

Taking another look at this I wonder if the older version of pg in CI is different than what we have locally and maybe the behavior on sigkill is different, and a newer pg does better cleanup?

I can't get this to race locally with 1 or 100 connections.

boustrophedon avatar Aug 17 '24 20:08 boustrophedon

I don't think it's a version mismatch - checked this with CI & local versions both 16.3. It could be an OS difference - CI is Ubuntu, local is MacOS.

nihohit avatar Aug 17 '24 22:08 nihohit

I'm also on linux with 16.3. If you run the test in release mode does it hang in CI still?

boustrophedon avatar Aug 17 '24 23:08 boustrophedon

Yup.

On Sun, 18 Aug 2024, 2:17 Harry Stern, @.***> wrote:

I'm also on linux with 16.3. If you run the test in release mode does it hang in CI still?

— Reply to this email directly, view it on GitHub https://github.com/boustrophedon/pgtemp/issues/9#issuecomment-2295023315, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDEBBTDS5JIQ4MVSKBW5CDZR7KZ7AVCNFSM6AAAAABKSVYIEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGAZDGMZRGU . You are receiving this because you authored the thread.Message ID: @.***>

nihohit avatar Aug 18 '24 04:08 nihohit

The newest release, v0.5.0, uses fast shutdown instead of sending SIGKILL. Could you check and see if this issue still occurs with the newest version?

boustrophedon avatar Sep 24 '24 04:09 boustrophedon

Did you get this to work?

jacobsvante avatar Feb 05 '25 21:02 jacobsvante

No, I tried it once, the issue seemed to have persisted, but I didn't have the time to investigate. Since then I've moved to another team, so I probably won't have another opportunity to test this, so @boustrophedon I guess the issue can be closed.

nihohit avatar Feb 05 '25 21:02 nihohit

Thanks @nihohit for the quick reply. I'm receiving the same error as you originally posted, so I don't think the issue should be closed.

@boustrophedon Have you been able to get it working in Github actions?

Update: Issue persists if I change to PgTempDB::new() instead of PgTempDB::async_new().

jacobsvante avatar Feb 05 '25 21:02 jacobsvante

@jacobsvante This is still happening on pgtemp 0.5?

boustrophedon avatar Feb 05 '25 22:02 boustrophedon

Yes, it appears so.

Do you have any integration tests?

jacobsvante avatar Feb 05 '25 22:02 jacobsvante

Yes, they're in the tests/ directory. I just added the sample test from this issue so we'll see what happens: https://github.com/boustrophedon/pgtemp/actions/runs/13167858114

boustrophedon avatar Feb 05 '25 22:02 boustrophedon

 Running tests/shutdown.rs (target/debug/deps/shutdown-9f5c6b24a836d5ff)

running 1 test test pgtemp_shutdown_test ... ok

The sample provided passed in CI (and then the build failed due to a linting error in the python client tests)

boustrophedon avatar Feb 05 '25 22:02 boustrophedon

The issue was that I had not read the requirements section of the readme, sorry a bit much on my plate lately...

I had forgot to add these lines:

      - name: Install postgres
        run: sudo apt-get install postgresql postgresql-client
      - name: Update path
        run: find /usr/lib/postgresql/ -type d -name "bin" >> $GITHUB_PATH

I'll see if I can find some time to add a check for this when running pgtest. Would make it less likely for stressed people like me being confused about why the tests are running for a long time and then just fail with a vague error message 😅

What is causing the issue btw?

jacobsvante avatar Feb 06 '25 21:02 jacobsvante

I don't know what would have caused your tests to hang - If I remove the postgres install and PATH lines, the tests simply error out. https://github.com/boustrophedon/pgtemp/actions/runs/13188950120/job/36817698923

boustrophedon avatar Feb 06 '25 23:02 boustrophedon

Strange. Maybe it was two tests running simultaneously. I'll see if I can find what was causing it.

jacobsvante avatar Feb 07 '25 06:02 jacobsvante