gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

AO table `block checksum does not match, expected 0x0D869A62 and found 0x3A2422A1`

Open cobolbaby opened this issue 2 years ago • 6 comments

Bug Report

Greenplum version or build

  • 6.19.1

OS version and uname -a

autoconf options used ( config.status --config )

Installation information ( pg_config )

Expected behavior

Actual behavior

/usr/local/greenplum-db/bin/gpbackup \
    --dbname F3_BDC \
    --plugin-config /opt/greenplum/config/gpbackup_s3_archive.yml \
    --include-table ict.ictlogtestpart_ao_old \
    --leaf-partition-data \
    --with-stats \
    --jobs 4 --quiet
...
20220719:16:33:36 gpbackup:gpadmin:gp6mdw:1434108-[DEBUG]:-Worker 3: COPY ict.ictlogtestpart_ao_old_1_prt_117 TO PROGRAM 'gzip -c -1 | /usr/local/greenplum-db-6.19.1/bin/gpbackup_s3_plugin backup_data /tmp/20220719161119_gpbackup_s3_archive.yml <SEG_DATA_DIR>/backups/20220719/20220719161119/gpbackup_<SEGID>_20220719161119_866786.gz' WITH CSV DELIMITER ',' ON SEGMENT IGNORE EXTERNAL PARTITIONS;
20220719:16:34:40 gpbackup:gpadmin:gp6mdw:1434108-[CRITICAL]:-ERROR: block checksum does not match, expected 0xFBE15EF2 and found 0xCEBAEAB4  (seg9 10.13.0.33:40001 pid=3432528) (SQLSTATE XX001)
github.com/greenplum-db/gpbackup/backup.backupDataForAllTables
        /tmp/build/3e49593f/go/src/github.com/greenplum-db/gpbackup/backup/data.go:365
github.com/greenplum-db/gpbackup/backup.backupData
        /tmp/build/3e49593f/go/src/github.com/greenplum-db/gpbackup/backup/backup.go:291
github.com/greenplum-db/gpbackup/backup.DoBackup
        /tmp/build/3e49593f/go/src/github.com/greenplum-db/gpbackup/backup/backup.go:169
main.main.func1
        /tmp/build/3e49593f/go/src/github.com/greenplum-db/gpbackup/gpbackup.go:23
github.com/spf13/cobra.(*Command).execute
        /tmp/build/3e49593f/go/pkg/mod/github.com/spf13/[email protected]/command.go:860
github.com/spf13/cobra.(*Command).ExecuteC
        /tmp/build/3e49593f/go/pkg/mod/github.com/spf13/[email protected]/command.go:974
github.com/spf13/cobra.(*Command).Execute
        /tmp/build/3e49593f/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
main.main
        /tmp/build/3e49593f/go/src/github.com/greenplum-db/gpbackup/gpbackup.go:27
runtime.main
        /usr/local/go/src/runtime/proc.go:255
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1581
...

This problem should be caused by the checksum of the AO table , rather than gpbackup.

Ref: https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/expected/ao_checksum_corruption.out

ERROR:  block checksum does not match, expected 0x2744B150 and found 0x3089A5CE  (seg9 slice1 10.13.0.35:50001 pid=2334835)
DETAIL:  Append-Only storage Small Content header: smallcontent_bytes_0_3 0x19023C0F, smallcontent_bytes_4_7 0xEDC01080, headerKind = 1, executorBlockKind = 1, rowCount = 143, usingChecksums = true, header checksum 0x27A8872A, block checksum 0x2744B150, dataLength 32622, compressedLength 4224, overallBlockLen 4240
CONTEXT:  Scan of Append-Only Row-Oriented relation 'ictlogtestpart_ao_old_1_prt_45'. Append-Only segment file 'base/863243/2014352.1', block header offset in file = 47578392, bufferCount 11239
SQL state: XX001

Step to reproduce the behavior

cobolbaby avatar Jul 19 '22 08:07 cobolbaby

What is the reason for this problem? I've had this problem a lot recently when I backup the greenplum.

How should I recover my data?

cobolbaby avatar Jul 20 '22 10:07 cobolbaby

Checksum errors mostly flags hardware/disk issues as its calculated right before writing data to disk and verified right after reading data from disk.

Do you get this error both on primary and mirror (like try failing over to mirror and run the query or copy the file from mirror to primary if sure no one is writing to this AO table)?

ashwinstar avatar Jul 20 '22 18:07 ashwinstar

Do you get this error both on primary and mirror

Just the primary segment report this issue. I wonder if the interruption of data writing and the restart of the database host could cause this problem.

cobolbaby avatar Jul 21 '22 01:07 cobolbaby

Just the primary segment report this issue. I wonder if the interruption of data writing and the restart of the database host could cause this problem.

this error indicate the data broken on disk and the reason is various. Greenplum catch up this failure and it is the most it can do. so is it OK to close this issue? thanks!

lij55 avatar Aug 02 '22 00:08 lij55

But I didn't see any IO error from dmesg -T -d. Generally, you can see some IO errors before the disk fails.

cobolbaby avatar Aug 02 '22 04:08 cobolbaby

Same question: https://www.modb.pro/db/27760

cobolbaby avatar Aug 03 '22 02:08 cobolbaby