pause Pause Deadlocks/database timeouts

I was able to reproduce the pause deadlocks and maybe the db timeouts on a pause docker image.

I believe the core problem is this:

Inside of a transaction (that's maaaaaaaaaaaybe a bit too long running) we do:

INSERT INTO primeur ...

And another process does this (outside of a transaction):

DELETE FROM perms WHERE LOWER(package) = LOWER(?)

The LOWER(package) prevents mysql from using an index lookup, which means it scans the entire table/index and locks way more than it needs to, and gets stuck behind the transaction with the INSERT.

I think to solve this we need to either...

Get rid of all duplicate rows due to historical bad data, then modify the column to have default mysql collation which will mean 'X' = 'x' in searches so we can do plain index lookups
Add another column "package_lower" which is the same as package but always lowercase, and use that
Do a select before any update and delete to retrieve the correct case package names and then do the updates/deletes with that data

Option 1 is probably best, 2 is okay, 3 sucks

Apr 29 '23 15:04 wolfsage

Along with this... we should:

find a way to shorten the transactions so the mysql modifying bits only happen when necessary
Move $dio->normalize_package_casing; into the transaction in lib/PAUSE/mldistwatch.pm

It not being in the transaction means when a delete fails on primeur, the delete that removes perms happened already, and that's what leads to https://github.com/andk/pause/issues/372

Apr 29 '23 15:04 wolfsage

We are going with option 1 which should negate the deadlocks almost entirely (except when folks updoad dists that share package names around the same time)

Apr 30 '23 14:04 wolfsage

https://github.com/wolfsage/pause/tree/no-more-deadlocks (need to clean up and write a deployment plan but tests all pass)

Apr 30 '23 14:04 wolfsage

MR opened!

Apr 30 '23 15:04 wolfsage

I'm working through the duplicates, contacting authors to explain the problem and asking if we can remove old releases and do the other things needed. This is part of @wolfsage's option 1 above.

May 09 '23 15:05 neilb

Andreas and I have worked through all the duplicates, and by either deleting old releases (with the author's permission), or deleting the old entry from the index, we are now duplicate-free.

May 22 '23 08:05 neilb

What's the next step here?

Jul 26 '23 22:07 karenetheridge

We believe this should now be fixed and the issue can be closed. Mixed deployment of #440 and #424 has happened just a few minutes ago.

Apr 26 '24 15:04 andk

Hooray!

Apr 26 '24 15:04 neilb