pgloader icon indicating copy to clipboard operation
pgloader copied to clipboard

pgloader 3.6.7 doesn't continue on fatal errors (ex. non whitespace after quoted data) contrary to pgloader 3.4.1

Open Kamal-learner-24 opened this issue 1 year ago • 1 comments

Hello every one,

Let me explain our problem

Recently, we migrate our solution from Redhat 7.5 with PostgreSQL 9.6.9 and pgloader 3.4.1 to Rocky Linux 8.9 with PostgreSQL 13.14 and pgloader 3.6.7

In the old system (Redhat 7.5 with PostgreSQL 9.6.9 and pgloader 3.4.1), when I try to load a CSV file having 478 894 lignes (14 lines having errors ), with a .LOAD command, pgloader 3.6.7 loads 478 880 lines. pgloader 3.4.1 runs as expected and continues loading when encoutring these errors.

In the new system (Rocky Linux 8.9 with PostgreSQL 13.14 and pgloader 3.6.7), when I try to load the same CSV file, with the same .LOAD command, pgloader 3.6.7 loads only 183 569 lines. pgloader 3.6.7 doesn't run as expected and seems to stop loading when encoutring these errors.

Here is the .LOAD command: LOAD CSV FROM /inputs/data/F024 WITH ENCODING UTF8 ( user_id [null if blanks], user_name_first [null if blanks], user_name_last [null if blanks] ) INTO postgresql:///db_rec_dv?cpy.cpy_cso_user_base(user_id, user_name_first, user_name_last) WITH truncate , fields optionally enclosed by '"' , fields terminated by ',' , prefetch rows = 50000 SET client_encoding to 'utf8' ,work_mem to '512MB' ,standard_conforming_strings to 'on' ;

Here is the error I get : 2024-08-08T13:47:09.233005+01:00 ERROR non whitespace after quoted data #<CSV-READER LINE-IDX:2 CHARACTER-LINE-IDX:22 CHARACTER-IDX:793 "byER6Vvdtb," {1005C0E263}> b 2024-08-08T13:47:09.233005+01:00 FATAL non whitespace after quoted data #<CSV-READER LINE-IDX:2 CHARACTER-LINE-IDX:22 CHARACTER-IDX:793 "byER6Vvdtb," {1005C0E263}> b

Here is the extract of the line on error (missing double quotes): "11","Colyneߌڢ,"Test"

Thank you for your help

Best regards,

Kamal

Kamal-learner-24 avatar Aug 16 '24 10:08 Kamal-learner-24

Yes, sorry, but I think you are relying on buggy behaviour, where the bug in question has been fixed six years ago. I'd propose fixing the data errors in the csv files. If I read that right, the csv-reader tells you the faulty lines (LINE-IDX).

svantevonerichsen6906 avatar Aug 21 '24 09:08 svantevonerichsen6906