ReplicaDB icon indicating copy to clipboard operation
ReplicaDB copied to clipboard

postgres sink - extra data after last expected column

Open ScottChapman opened this issue 1 year ago • 7 comments

Describe the bug Running replication between MySQL -> Postgres

./bin/replicadb --options-file replicadb.conf
2023-07-03 08:49:46,554 INFO  ReplicaDB:63 Running ReplicaDB version: 0.15.0
2023-07-03 08:49:46,559 INFO  ReplicaDB:66 Setting verbose mode DEBUG
2023-07-03 08:49:46,559 DEBUG ReplicaDB:67 ToolOptions{
	sourceConnect='jdbc:mysql://myserver:3306/bugs?useSSL=true&trustServerCertificate=true',
	sourceUser='whatever',
	sourcePassword='****',
	sourceTable='bugs.attach_data',
	sourceColumns='null',
	sourceWhere='null',
	sourceQuery='null',
	sinkConnect='jdbc:postgresql://myserver:5432/bugs',
	sinkUser='whatever',
	sinkPassword='****',
	sinkTable='attach_data',
	sinkStagingTable='null',
	sinkStagingSchema='null',
	sinkStagingTableAlias='null',
	sinkColumns='null',
	sinkDisableEscape=false,
	sinkDisableIndex=false,
	sinkDisableTruncate=false,
	sinkAnalyze=false,
	jobs=5,
	BandwidthThrottling=0,
	quotedIdentifiers=false,
	fetchSize=500,
	help=false,
	version=false,
	verbose=DEBUG,
	optionsFile='replicadb.conf',
	mode='complete',
	sentryDsn='https://[email protected]/5962725',
	sourceConnectionParams={},
	sinkConnectionParams={},
	sourceFileFormat='null',
	sinkFileformat='null'}
2023-07-03 08:49:46,653 INFO  Sentry:27 Sentry enabled
2023-07-03 08:49:47,608 DEBUG MySQLManager:348 Calculating the chunks size with this sql: SELECT  CEIL(count(*) / 5) chunk_size, count(*) total_rows FROM bugs.attach_data
2023-07-03 08:49:47,673 DEBUG MySQLManager:353 chunkSize: 30311 totalNumberRows: 151555
2023-07-03 08:49:47,687 INFO  SqlManager:373 Truncating sink table with this command: TRUNCATE TABLE attach_data
2023-07-03 08:49:48,400 INFO  ReplicaTask:35 Starting  TaskId-4
2023-07-03 08:49:48,400 INFO  ReplicaTask:35 Starting  TaskId-1
2023-07-03 08:49:48,400 INFO  ReplicaTask:35 Starting  TaskId-3
2023-07-03 08:49:48,400 INFO  ReplicaTask:35 Starting  TaskId-2
2023-07-03 08:49:48,400 INFO  ReplicaTask:35 Starting  TaskId-0
2023-07-03 08:49:48,943 INFO  SqlManager:128 TaskId-2: Executing SQL statement: SELECT * FROM bugs.attach_data LIMIT ? OFFSET ?
2023-07-03 08:49:48,950 INFO  SqlManager:128 TaskId-0: Executing SQL statement: SELECT * FROM bugs.attach_data LIMIT ? OFFSET ?
2023-07-03 08:49:48,950 INFO  SqlManager:128 TaskId-1: Executing SQL statement: SELECT * FROM bugs.attach_data LIMIT ? OFFSET ?
2023-07-03 08:49:48,950 INFO  SqlManager:128 TaskId-4: Executing SQL statement: SELECT * FROM bugs.attach_data LIMIT ? OFFSET ?
2023-07-03 08:49:48,957 INFO  SqlManager:128 TaskId-3: Executing SQL statement: SELECT * FROM bugs.attach_data LIMIT ? OFFSET ?
2023-07-03 08:49:48,968 DEBUG SqlManager:133 TaskId-2: Using fetchSize for next query: 500
2023-07-03 08:49:48,968 DEBUG SqlManager:133 TaskId-3: Using fetchSize for next query: 500
2023-07-03 08:49:48,968 DEBUG SqlManager:133 TaskId-1: Using fetchSize for next query: 500
2023-07-03 08:49:48,968 DEBUG SqlManager:133 TaskId-4: Using fetchSize for next query: 500
2023-07-03 08:49:48,968 DEBUG SqlManager:133 TaskId-0: Using fetchSize for next query: 500
2023-07-03 08:49:48,971 INFO  SqlManager:148 TaskId-0: With args: 30311, 0,
2023-07-03 08:49:48,971 INFO  SqlManager:148 TaskId-1: With args: 30311, 30311,
2023-07-03 08:49:48,971 INFO  SqlManager:148 TaskId-3: With args: 30311, 90933,
2023-07-03 08:49:48,971 INFO  SqlManager:148 TaskId-4: With args: 30311, 121244,
2023-07-03 08:49:48,971 INFO  SqlManager:148 TaskId-2: With args: 30311, 60622,
2023-07-03 08:49:50,925 WARN  ConnManager:188 Options source-columns and sink-columns are null, getting from Source ResultSetMetaData: id,thedata
2023-07-03 08:49:50,932 INFO  PostgresqlManager:168 Copying data with this command: COPY attach_data (id,thedata) FROM STDIN WITH DELIMITER e'\x1f' ENCODING 'UTF-8'
2023-07-03 08:49:56,373 INFO  PostgresqlManager:168 Copying data with this command: COPY attach_data (id,thedata) FROM STDIN WITH DELIMITER e'\x1f' ENCODING 'UTF-8'
2023-07-03 08:50:40,643 INFO  PostgresqlManager:168 Copying data with this command: COPY attach_data (id,thedata) FROM STDIN WITH DELIMITER e'\x1f' ENCODING 'UTF-8'
2023-07-03 08:51:43,430 INFO  PostgresqlManager:168 Copying data with this command: COPY attach_data (id,thedata) FROM STDIN WITH DELIMITER e'\x1f' ENCODING 'UTF-8'
2023-07-03 08:52:32,807 INFO  PostgresqlManager:168 Copying data with this command: COPY attach_data (id,thedata) FROM STDIN WITH DELIMITER e'\x1f' ENCODING 'UTF-8'
2023-07-03 08:56:06,350 ERROR ReplicaTask:69 ERROR in TaskId-0 inserting data to sink table: [ERROR: invalid input syntax for type bytea
  Where: COPY attach_data, line 2, column thedata: "RCS file: /repository/ghost/src/client_side.c,v
retrieving revision 1.44.2.4
diff -u -r1.44.2.4 clie..."]
2023-07-03 09:00:08,807 ERROR ReplicaTask:69 ERROR in TaskId-1 inserting data to sink table: [ERROR: extra data after last expected column
  Where: COPY attach_data, line 2: "73111��ࡱ�\N\N\N\N\N\N\N\N\N\N\N\N\N\N\N\N>\N\N��	\N\N\N\N\N\N\N\N\N\N\N\N\N\N\N�..."]
2023-07-03 09:01:22,591 ERROR ReplicaTask:69 ERROR in TaskId-2 inserting data to sink table: [ERROR: extra data after last expected column
  Where: COPY attach_data, line 4: "348304�PNG\r\n\n\N\N\N\rIHDR\N\NL\N\N\N\N\N�K\nX\N\N\NJtEXtSoftware\NRRDtool, Tobias Oe..."]
2023-07-03 09:02:54,320 ERROR ReplicaTask:69 ERROR in TaskId-4 inserting data to sink table: [ERROR: extra data after last expected column
  Where: COPY attach_data, line 2: "907571PK\N\\N\N\N!\Nv2�\N\NZ
                                                                \N\N\[Content_Types].xml �(�\N\N\N\N\N\N\N\N\N\N..."]
2023-07-03 09:07:23,519 ERROR ReplicaTask:69 ERROR in TaskId-3 inserting data to sink table: [ERROR: extra data after last expected column
  Where: COPY attach_data, line 3: "621129����\NJFIF\N\N`\N`\N\N��\NC\N
                                                                              \\n\..."]
2023-07-03 09:07:23,523 ERROR ReplicaDB:134 Got exception running ReplicaDB:
java.util.concurrent.ExecutionException: org.postgresql.util.PSQLException: ERROR: invalid input syntax for type bytea
  Where: COPY attach_data, line 2, column thedata: "RCS file: /repository/ghost/src/client_side.c,v
retrieving revision 1.44.2.4
diff -u -r1.44.2.4 clie..."
	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
	at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
	at org.replicadb.ReplicaDB.processReplica(ReplicaDB.java:115) [ReplicaDB-0.15.0.jar:0.15.0]
	at org.replicadb.ReplicaDB.main(ReplicaDB.java:47) [ReplicaDB-0.15.0.jar:0.15.0]
Caused by: org.postgresql.util.PSQLException: ERROR: invalid input syntax for type bytea
  Where: COPY attach_data, line 2, column thedata: "RCS file: /repository/ghost/src/client_side.c,v
retrieving revision 1.44.2.4
diff -u -r1.44.2.4 clie..."
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675) ~[postgresql-42.3.7.jar:42.3.7]
	at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1263) ~[postgresql-42.3.7.jar:42.3.7]
	at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:1068) ~[postgresql-42.3.7.jar:42.3.7]
	at org.postgresql.core.v3.CopyInImpl.endCopy(CopyInImpl.java:49) ~[postgresql-42.3.7.jar:42.3.7]
	at org.replicadb.manager.PostgresqlManager.insertDataToTable(PostgresqlManager.java:134) ~[ReplicaDB-0.15.0.jar:0.15.0]
	at org.replicadb.ReplicaTask.call(ReplicaTask.java:65) ~[ReplicaDB-0.15.0.jar:0.15.0]
	at org.replicadb.ReplicaTask.call(ReplicaTask.java:15) ~[ReplicaDB-0.15.0.jar:0.15.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
2023-07-03 09:07:23,631 INFO  ReplicaDB:54 Total process time: 1057100ms

To Reproduce Steps to reproduce the behaviour: config above

Source table DDL id = mediumint(9) PK thedata = longblob

Sink table DDL in = integer thedata = bytea

ReplicaDB configuration options-file. Expected behavior A clear and concise description of what you expected to happen.

Additional context

ScottChapman avatar Jul 03 '23 13:07 ScottChapman

Hi @ScottChapman

To open a new issue, please, follow the issue template:

Describe the bug A clear and concise description of what the bug is, with verbose log.

To Reproduce Steps to reproduce the behaviour:

  1. Source table DDL
  2. Sink table DDL
  3. ReplicaDB configuration options-file.

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here. Running environment (cloud, on premise, java version..), source and sink technologies (Oracle, MySQL, Postgres...)

osalvador avatar Jul 03 '23 13:07 osalvador

Hi @ScottChapman

To open a new issue, please, follow the issue template:

Describe the bug A clear and concise description of what the bug is, with verbose log.

To Reproduce Steps to reproduce the behaviour:

  1. Source table DDL
  2. Sink table DDL
  3. ReplicaDB configuration options-file.

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here. Running environment (cloud, on premise, java version..), source and sink technologies (Oracle, MySQL, Postgres...)

Sorry, updated above. I'll close the other one since both errors are in same run. Thanks!

ScottChapman avatar Jul 03 '23 14:07 ScottChapman

Hi @ScottChapman

I have made a fix that may solve your issue. Can you try the latest release? https://github.com/osalvador/ReplicaDB/releases/tag/v0.15.1

Regards!

osalvador avatar Jul 04 '23 10:07 osalvador

Yes, better. I do get the error but much less frequently. Is it possible to dump any information that might be helpful in identifying the data that is causing the problem for me?

ScottChapman avatar Jul 05 '23 21:07 ScottChapman

@osalvador - are the copy commands kept some place? I assume there is a tmp file for the SQL commands that are executed? Is it possible to preserve that for debugging purposes?

ScottChapman avatar Jul 08 '23 13:07 ScottChapman

OK. I think I narrowed it down to the data that is problematic. There is a bunch of unicode data in it. I can probably clean it up a bit so I can share it if you are interested.

ScottChapman avatar Jul 08 '23 17:07 ScottChapman

Hi @ScottChapman

How was it?

osalvador avatar Jul 14 '23 13:07 osalvador