gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

[6X backport] Improve performance of pgarch_readyXlog() with many failed status file and history file prioritization

Open SunilS26 opened this issue 1 year ago • 0 comments

[!NOTE] This PR is the Partial Backport of GPDB PR https://github.com/greenplum-db/gpdb/pull/16984

Backporting Below Upstream Commit:

  1. https://github.com/postgres/postgres/commit/b981df4cc09aca978c5ce55e437a74913d09cccc
  2. https://github.com/postgres/postgres/commit/beb4e9ba1652a04f66ff20261444d06f678c0b2d
  3. https://github.com/postgres/postgres/commit/1fb17b1903414676bd371068739549cd2966fe87 *For more information on actual changes please refer to the relevant commit message.

Original Commit Message:

Prioritize history files when archiving

Prioritize history files when archiving

At the end of recovery for the post-promotion process, a new history
file is created followed by the last partial segment of the previous
timeline.  Based on the timing, the archiver would first try to archive
the last partial segment and then the history file.  This can delay the
detection of a new timeline taken, particularly depending on the time it
takes to transfer the last partial segment as it delays the moment the
history file of the new timeline gets archived.  This can cause promoted
standbys to use the same timeline as one already taken depending on the
circumstances if multiple instances look at archives at the same
location.

This commit changes the order of archiving so as history files are
archived in priority over other file types, which reduces the likelihood
of the same timeline being taken (still not reducing the window to
zero), and it makes the archiver behave more consistently with the
startup process doing its post-promotion business.

Author: David Steele
Reviewed-by: Michael Paquier, Kyotaro Horiguchi
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 9.5

Performance improvement of archive_status directory scan with many failed status files

Improve performance of pgarch_readyXlog() with many status files.

Presently, the archive_status directory was scanned for each file to
archive.  When there are many status files, say because archive_command
has been failing for a long time, these directory scans can get very
slow.  With this change, the archiver remembers several files to archive
during each directory scan, speeding things up.

To ensure timeline history files are archived as quickly as possible,
XLogArchiveNotify() forces the archiver to do a new directory scan as
soon as the .ready file for one is created.

Nathan Bossart, per a long discussion involving many people. It is
not clear to me exactly who out of all those people reviewed this
particular patch.

Discussion: http://postgr.es/m/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com
Discussion: http://postgr.es/m/[email protected]

Archiver Crash fix due to corrupted status file name

Fix issues in pgarch's new directory-scanning logic.

The arch_filenames[] array elements were one byte too small, so that
a maximum-length filename would get corrupted if another entry
were made after it.  (Noted by Thomas Munro, fix by Nathan Bossart.)

Move these arrays into a palloc'd struct, so that we aren't wasting
a few kilobytes of static data in each non-archiver process.

Add a binaryheap_reset() call to make it plain that we start the
directory scan with an empty heap.  I don't think there's any live
bug of that sort, but it seems fragile, and this is very cheap
insurance.

Cleanup for commit https://github.com/postgres/postgres/commit/beb4e9ba1652a04f66ff20261444d06f678c0b2d, so no back-patch needed.

Discussion: https://postgr.es/m/CA+hUKGLHAjHuKuwtzsW7uMJF4BVPcQRL-UMZG_HM-g0y7yLkUg@mail.gmail.com

${\color{RoyalBlue} \underline{\textbf{Summary:}} }$

In brief this PR introduces following changes ▪︎ History file Prioritization when pushing status file to archive location ▪︎ Reduce archive_status dir scan when multiple .ready file exists

This improves the performance because When there are many failed status files, say because archive_command has been failing for a long time, these directory scans can get very slow(cloud storage like S3, GCP, Azure, NFS). With this change, the archiver remembers several files to archive during each directory scan, speeding things up.


${\color{RoyalBlue} \underline{\textbf{Testing:}}}$

  • Manullay Verified the changes
  • Some basic sanity test

Log Snippet:

Previous Behaviour:

Scan the archive_status directory every time to fetch the status file name without any prioritization.


2024-01-22 12:18:09.983240 IST,,,p89087,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN...",,,,,,,0,,"pgarch.c",722,
2024-01-22 12:18:09.983336 IST,,,p89087,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Scanning archive_status directory",,,,,,,0,,"pgarch.c",727,
2024-01-22 12:18:09.983492 IST,,,p89087,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: READDIR Found status file  00000002.history.ready",,,,,,,0,,"pgarch.c",736,
2024-01-22 12:18:09.984954 IST,,,p89087,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: READDIR Found status file  000000020000000000000005.ready ",,,,,,,0,,"pgarch.c",741,
2024-01-22 12:18:09.985043 IST,,,p89087,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: READDIR Found status file  000000010000000000000005.partial.ready",,,,,,,0,,"pgarch.c",741,
2024-01-22 12:18:09.985082 IST,,,p89087,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: END... Returning status file 000000010000000000000005.partial",,,,,,,0,,"pgarch.c",755,

2024-01-22 12:18:12.114753 IST,,,p89087,th-492510976,,,,0,,,seg0,,,,,"WARNING","01000","archiving transaction log file ""000000010000000000000005.partial"" failed too many times, will try again later",,,,,,,0,,"pgarch.c",515,

New Behaviour:

Utilize the cached status file directory entries to feed the archiver command.


2024-01-21 13:53:51.001047 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN... files remaining:0 ",,,,,,,0,,"pgarch.c",750,
2024-01-21 13:53:51.001824 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002008 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002061 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002368 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002415 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir found 000000020000000000000008 ",,,,,,,0,,"pgarch.c",834,
2024-01-21 13:53:51.002473 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002525 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002568 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir found 00000002.history ",,,,,,,0,,"pgarch.c",834,
2024-01-21 13:53:51.002612 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002662 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir found 000000010000000000000002 ",,,,,,,0,,"pgarch.c",834,
2024-01-21 13:53:51.002706 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002751 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir found 000000010000000000000006.partial ",,,,,,,0,,"pgarch.c",834,
2024-01-21 13:53:51.002800 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.002846 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir found 000000020000000000000009 ",,,,,,,0,,"pgarch.c",834,
2024-01-21 13:53:51.003117 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.003170 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir found 000000020000000000000007 ",,,,,,,0,,"pgarch.c",834,
2024-01-21 13:53:51.003246 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,
2024-01-21 13:53:51.003289 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir found 000000020000000000000006 ",,,,,,,0,,"pgarch.c",834,
2024-01-21 13:53:51.003341 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: END... XLOG File 00000002.history   files remaining:6 ",,,,,,,0,,"pgarch.c",874,

<<<<<<<<<< 00000002.history  file archive is successful

2024-01-21 13:53:51.030751 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN... files remaining:6 ",,,,,,,0,,"pgarch.c",750,
2024-01-21 13:53:51.030840 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Found Cached Xlog status file details ",,,,,,,0,,"pgarch.c",760,
2024-01-21 13:53:51.030895 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Returning Cached XLOG File 000000010000000000000002 ",,,,,,,0,,"pgarch.c",772,

2024-01-21 13:53:51.053162 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN... files remaining:5 ",,,,,,,0,,"pgarch.c",750,
2024-01-21 13:53:51.053367 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Found Cached Xlog status file details ",,,,,,,0,,"pgarch.c",760,
2024-01-21 13:53:51.053441 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Returning Cached XLOG File 000000010000000000000006.partial ",,,,,,,0,,"pgarch.c",772,

2024-01-21 13:53:51.073236 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN... files remaining:4 ",,,,,,,0,,"pgarch.c",750,
2024-01-21 13:53:51.073332 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Found Cached Xlog status file details ",,,,,,,0,,"pgarch.c",760,
2024-01-21 13:53:51.073374 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Returning Cached XLOG File 000000020000000000000006 ",,,,,,,0,,"pgarch.c",772,

2024-01-21 13:53:51.097333 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN... files remaining:3 ",,,,,,,0,,"pgarch.c",750,
2024-01-21 13:53:51.097438 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Found Cached Xlog status file details ",,,,,,,0,,"pgarch.c",760,
2024-01-21 13:53:51.097478 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Returning Cached XLOG File 000000020000000000000007 ",,,,,,,0,,"pgarch.c",772,

2024-01-21 13:53:51.117122 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN... files remaining:2 ",,,,,,,0,,"pgarch.c",750,
2024-01-21 13:53:51.117949 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Found Cached Xlog status file details ",,,,,,,0,,"pgarch.c",760,
2024-01-21 13:53:51.118003 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Returning Cached XLOG File 000000020000000000000008 ",,,,,,,0,,"pgarch.c",772,

2024-01-21 13:53:51.138403 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN... files remaining:1 ",,,,,,,0,,"pgarch.c",750,
2024-01-21 13:53:51.138531 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Found Cached Xlog status file details ",,,,,,,0,,"pgarch.c",760,
2024-01-21 13:53:51.138580 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: Returning Cached XLOG File 000000020000000000000009 ",,,,,,,0,,"pgarch.c",772,

2024-01-21 13:53:51.151769 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug: BEGIN... files remaining:0 ",,,,,,,0,,"pgarch.c",750,
2024-01-21 13:53:51.151988 IST,,,p25877,th-492510976,,,,0,,,seg0,,,,,"LOG","00000","Backport Debug:ReadDir Scanning archive_status directory ",,,,,,,0,,"pgarch.c",798,

Here are some reminders before you submit the pull request

  • [ ] Add tests for the change
  • [X] Document changes
  • [ ] Communicate in the mailing list if needed
  • [X] Pass make installcheck
  • [ ] Review a PR in return to support the community

SunilS26 avatar Jan 22 '24 08:01 SunilS26