hawq
hawq copied to clipboard
HAWQ 1078. Implement hawqsync-falcon DR utility.
This is the initial commit for a Python utility to orchestrate a DR syncronization for HAWQ, based on Falcon HDFS replication and a cold backup of the active HAWQ master's MASTER_DATA_DIRECTORY.
A code review would be greatly appreciated, when someone has cycles. Active testing is currently underway in a production deployment.
@vvineet How can we get this prioritized for the next release? Also, anyone that can put eyes on it for a code review would be helpful.
@kdunn-pivotal : I propose a discussion including @ictmalili as this ties with HAWQ Register feature. I'd love to see the contribution make it in HAWQ.
HAWQSYNC partial-sync recovery runbook:
-
Copy "last known good state" tarball from
hdfs://hawq_default/hawqExtract-*.tar.bz2
-
Re-run
hawqsync-extract
to establish "current state". -
Perform diff's for every table file, determine tables with inconsistencies.
-
For each inconsistent table: Re-register
faultyTable
using "last known good" YAML - (updates the EOF field only). a.hawq register --force -f faultyTable.yaml faultyTable
Store the valid records in a temporary table b.
CREATE TABLE newTemp AS SELECT * FROM faultyTable
Truncate the faulty table, to allow the catalog and HDFS file sizes to be consistent again c.
TRUNCATE faultyTable
Re-populate the table with valid records d.
INSERT INTO faultyTable SELECT * FROM newTemp
Purge the temporary table e.
DROP TABLE newTemp
This process, overall, ensures our catalog EOF marker and actual HDFS file size are properly aligned for every table. This is especially important when ETL needs to resume on tables that may have previously had "inconsistent bytes" appended, as would be the case for a partial sync.