pause icon indicating copy to clipboard operation
pause copied to clipboard

Pause Deadlocks/database timeouts

Open wolfsage opened this issue 2 years ago • 7 comments

I was able to reproduce the pause deadlocks and maybe the db timeouts on a pause docker image.

I believe the core problem is this:

Inside of a transaction (that's maaaaaaaaaaaybe a bit too long running) we do:

INSERT INTO primeur ...

And another process does this (outside of a transaction):

DELETE FROM perms WHERE LOWER(package) = LOWER(?)

The LOWER(package) prevents mysql from using an index lookup, which means it scans the entire table/index and locks way more than it needs to, and gets stuck behind the transaction with the INSERT.

I think to solve this we need to either...

  1. Get rid of all duplicate rows due to historical bad data, then modify the column to have default mysql collation which will mean 'X' = 'x' in searches so we can do plain index lookups
  2. Add another column "package_lower" which is the same as package but always lowercase, and use that
  3. Do a select before any update and delete to retrieve the correct case package names and then do the updates/deletes with that data

Option 1 is probably best, 2 is okay, 3 sucks

wolfsage avatar Apr 29 '23 15:04 wolfsage

Along with this... we should:

  1. find a way to shorten the transactions so the mysql modifying bits only happen when necessary
  2. Move $dio->normalize_package_casing; into the transaction in lib/PAUSE/mldistwatch.pm

It not being in the transaction means when a delete fails on primeur, the delete that removes perms happened already, and that's what leads to https://github.com/andk/pause/issues/372

wolfsage avatar Apr 29 '23 15:04 wolfsage

We are going with option 1 which should negate the deadlocks almost entirely (except when folks updoad dists that share package names around the same time)

wolfsage avatar Apr 30 '23 14:04 wolfsage

https://github.com/wolfsage/pause/tree/no-more-deadlocks (need to clean up and write a deployment plan but tests all pass)

wolfsage avatar Apr 30 '23 14:04 wolfsage

MR opened!

wolfsage avatar Apr 30 '23 15:04 wolfsage

I'm working through the duplicates, contacting authors to explain the problem and asking if we can remove old releases and do the other things needed. This is part of @wolfsage's option 1 above.

neilb avatar May 09 '23 15:05 neilb

Andreas and I have worked through all the duplicates, and by either deleting old releases (with the author's permission), or deleting the old entry from the index, we are now duplicate-free.

neilb avatar May 22 '23 08:05 neilb

What's the next step here?

karenetheridge avatar Jul 26 '23 22:07 karenetheridge

We believe this should now be fixed and the issue can be closed. Mixed deployment of #440 and #424 has happened just a few minutes ago.

andk avatar Apr 26 '24 15:04 andk

Hooray!

neilb avatar Apr 26 '24 15:04 neilb