gpbackup Exception when restoring timestamp field across major versions

When restoring the backup data of greenplum5 to greenplum6, one table failed to import, and the error message is as follows:

20210502:02:06:50 gprestore:gpadmin:gp6mdw:008984-[ERROR]:-Error loading data into table dw.fact_pca_yield_unit: COPY fact_pca_yield_unit, line 37486852, column test_date: "2017-12-12-3B0": ERROR: invalid input syntax for type timestamp: "2017-12-12-3B0"  (seg8 10.13.0.54:40000 pid=15339) (SQLSTATE 22007)

I suspect that the timestamp content format is abnormal. So I tried to retrieve the data for that time period. But because the scale of the data is still very large, no exact data can be located in the end.

In response to the above question, I have two questions:

How to quickly locate the specific record through the prompt line 37486852?
Why does Greenplum5 support abnormal timestamp format? Is there a potential bug?

May 05 '21 15:05 cobolbaby

Hey, this might be a known limitation of restoring GPDB5 data to GPDB6. It would be good to review the migration documentation located here: https://greenplum.docs.pivotal.io/6-16/install_guide/migrate.html.

From the documentation, it lists the following note: ======================================== Greenplum 6 no longer automatically converts from the deprecated timestamp format YYYYMMDDHH24MISS. The format could not be parsed unambiguously in previous Greenplum Database releases. You can still specify the YYYYMMDDHH24MISS format in conversion functions such as to_timestamp and to_char for compatibility with other database systems. You can use input formats for converting text to date or timestamp values to avoid unexpected results or query execution failures. For example, this SELECT command returns a timestamp in Greenplum Database 5 and fails in 6.

SELECT to_timestamp('20190905140000');

To convert the string to a timestamp in Greenplum Database 6, you must use a valid format. Both of these commands return a timestamp in Greenplum Database 6. The first example explicitly specifies a timestamp format. The second example uses the string in a format that Greenplum Database recognizes.

SELECT to_timestamp('20190905140000','YYYYMMDDHH24MISS');
SELECT to_timestamp('201909051 40000');

The timestamp issue also applies when you use the :: syntax. In Greenplum Database 6, the first command returns an error. The second command returns a timestamp.

SELECT '20190905140000'::timestamp ;
SELECT '20190905 140000'::timestamp ;

May 05 '21 20:05 jimmyyih

I haven't found such a format. I think it is due to other reasons.

I used the following method to export test.tar.gz in the abnormal time period:

$ copy (select test_date from dw.fact_pca_yield_unit where test_date between '2017-12-12 00:00:00' and '2017-12-13 00:00:00') to '/tmp/test.csv' WITH csv;

May 06 '21 06:05 cobolbaby

Hey, sorry for the late response on this Github issue. Are you still having issues with this?

I haven't found such a format. I think it is due to other reasons. I used the following method to export test.tar.gz in the abnormal time period...

Yes, it seems like the abnormal timestamp does not exist in the tarball you provided (are you able to COPY in that tarball data into a simple timestamp table on your 6X cluster?). And I'm not able to reproduce the issue. Could you try doing the following:

What is the exact data type for the test_date column on the 5X cluster? Try running SELECT a.atttypid, t.typname FROM pg_attribute a JOIN pg_type t ON a.atttypid = t.oid WHERE a.attrelid = 'dw.fact_pca_yield_unit'::regclass AND a.attname = 'test_date'; on both your 5X cluster and 6X cluster.
Try restoring the same backup (with --include-table dw.fact_pca_yield_unit flag to target only the issue table) to your 5X cluster in a temporary database. See if the error shows up on the 5X cluster or not.
This may be inconvenient... try examining the backup data file to get the abnormal row if you can. It'll be under your seg8 data directory (from seg8 10.13.0.54 log output) in backups/<YYYYMMDD>/<backup_timestamp>/gpbackup_8_<backup_timestamp>_<table_oid>.gz. If the data file is stored in S3, you may need to retrieve it from the S3 bucket to examine it. To get the table OID, simple run SELECT 'dw.fact_pca_yield_unit'::regclass::oid.

Essentially... we need to isolate where the issue is. At the moment, I'm unable to tell if it's possibly one of the following:

There's a row with an abnormal timestamp in your table on the 5X cluster. If this is the case, the row needs to be fixed or removed.
There's an abnormal timestamp that was incorrectly recorded into the data backup. If this is the case, the table needs to be unfortunately backed up again.
There's an abnormal timestamp that is interpreted incorrectly on the 6X cluster on the restore attempt. If this is the case, we'll need to take the abnormal timestamp sample and create a bug fix for GPDB 6.

May 19 '21 19:05 jimmyyih

gpbackup gpbackup copied to clipboard

Exception when restoring timestamp field across major versions

gpbackup
gpbackup copied to clipboard