pg_partman icon indicating copy to clipboard operation
pg_partman copied to clipboard

Bug in partition_gap_fill() in 4.7.4

Open aleszeleny opened this issue 1 year ago • 9 comments

Hello, just a question, are you interested in fixes to 4.x (i.e. 4.7.4) version? If yes, shall I base a patch proposal on the commit 50ab3a0?

The bug was detected in our environment with logical replication (lock issue dute forced analyze) and the fix is quite simple, but since there is version 5 I don't know, whether, if yes how to properly propose a patch (for example, our next scheduled outage for upgrades after several months, so for me simply update to 4.7.5 would be good as it might be done online).

Thanks Ales Zeleny

aleszeleny avatar Jan 08 '24 18:01 aleszeleny

If you're able to do a PR against the current 5.x main branch I can take a look at it to see about backpatching it to 4.x

keithf4 avatar Jan 08 '24 18:01 keithf4

I haven't found the code in 5.x, what about creating a branch in my fork and let you review whether it is somewhere else applicable to 5.x ?

aleszeleny avatar Jan 08 '24 18:01 aleszeleny

I don't believe the code for that particular function has changed in 5.0 compared to 4.7.4, so if you did the PR against this file it should be the same?

https://github.com/pgpartman/pg_partman/blob/master/sql/functions/partition_gap_fill.sql

keithf4 avatar Jan 08 '24 19:01 keithf4

If it's easier to do against your own fork and just point me to it, that works too.

keithf4 avatar Jan 08 '24 19:01 keithf4

The proposed change to 4.7 is in the pull request. The reason why it does not apply to 5.x is removed p_analyze parameter in create_partition_<id | time>() functions, or I have missed something :-) Running analyze during gap_fill() -> create_partition_time() causes the logical replication worker to wait for row exclusive lock, while the process running hangs on waiting for transaction id, so it resulted in an infinite wait. Hope this helps.

The conflict on the control file is just a result as I've modified it for running tests (in case there will be a 4.x maintenance branch, it'll be hopefully OK) Changes I've made are here in the diff

Thanks, Ales Zeleny

aleszeleny avatar Jan 09 '24 12:01 aleszeleny

Ahh ok. Thank you!

keithf4 avatar Jan 09 '24 14:01 keithf4

So for version 5.x, the analyze step was removed from the individual create_partition functions and is now only done directly through the run_maintenance function. Prior to 5.x, since the analyze was being done by the actual creation functions, that meant the gap fill function would also trigger the analyze. This is no longer an issue in 5.0, so there's no need to adapt your patch for the latest release at this time.

I'll see what I can do for 4.x

keithf4 avatar Jan 16 '24 16:01 keithf4

If you're able to test, I do have a PR up with the 4.8.0 update to add this parameter to the gap fill function

https://github.com/pgpartman/pg_partman/pull/615

Again, this won't be an issue in 5.0 since the analyze step has been refactored and shouldn't run during gap fill calls anymore.

keithf4 avatar Jan 24 '24 17:01 keithf4

Thanks a lot!

I've looked at the patch and it seems to me one missing point, see https://github.com/pgpartman/pg_partman/pull/616 . Kind regards Ales Zeleny

aleszeleny avatar Jan 25 '24 15:01 aleszeleny

Version 5.1 and 4.8.0 have been released. Note that you can only install version 4.8.0 as an update from a previous version. Highly recommend looking into migrating to 5.1 as soon as possible.

keithf4 avatar Apr 05 '24 13:04 keithf4