pgloader icon indicating copy to clipboard operation
pgloader copied to clipboard

CSV header with newline breaks import

Open infojunkie opened this issue 3 years ago • 0 comments

The included CSV file has a newline in one of the header fields. This causes the import to fail with

FATAL non whitespace after quoted data [...]

When removing the newline in the header, the import succeeds. I expect the import to succeed in both cases.

  • [x] pgloader --version
pgloader version "3.6.2c52da1"
compiled with SBCL 2.1.11.debian
  • [x] did you test a fresh compile from the source tree?

  • [x] did you search for other similar issues?

  • [x] how can I reproduce the bug?

load csv
    from inline
    (
        noc,
        occupation_title,
        esdc_wage_rate_low_2021,
        esdc_wage_rate_median_2021,
        esdc_wage_rate_high_2021,
        calculated_median_annual_salary_2021,
        source_information
    )
    into postgresql:///pgloader?test_table
    with
        skip header = 1,
        fields optionally enclosed by '"',
        fields terminated by ','

    before load do
    $$ drop table if exists test_table; $$,
    $$ create table if not exists test_table (
          noc text primary key,
          occupation_title text,
          esdc_wage_rate_low_2021 text,
          esdc_wage_rate_median_2021 text,
          esdc_wage_rate_high_2021 text,
          calculated_median_annual_salary_2021 text,
          source_information text
    );
    $$
;

NOC,"Occupation Title","ESDC Wage Rate Low
2021","ESDC Wage Rate Median 2021","ESDC Wage Rate High 2021","Calculated Median Annual Salary 2021","Source Information"
0000,Title,1,2,3,4,"Data field
with newline is ok"
  • [x] pgloader output you obtain
2022-07-02T17:16:32.004001-07:00 LOG pgloader version "3.6.2c52da1"
2022-07-02T17:16:32.004001-07:00 LOG Data errors in '/tmp/pgloader/'
2022-07-02T17:16:32.004001-07:00 LOG Parsing commands from file #P"/home/kratib/src/quartech/workbc-ssot/migration/newline-header.load"
2022-07-02T17:16:32.180009-07:00 WARNING Target table "\"public\".\"test_table\"" has 1 indexes defined against it.
2022-07-02T17:16:32.180009-07:00 WARNING That could impact loading performance badly.
2022-07-02T17:16:32.180009-07:00 WARNING Consider the option 'drop indexes'.
2022-07-02T17:16:32.204010-07:00 ERROR non whitespace after quoted data #<CSV-READER LINE-IDX:0 CHARACTER-LINE-IDX:8 CHARACTER-IDX:8 "2021," {1009727C23}> E
2022-07-02T17:16:32.204010-07:00 FATAL non whitespace after quoted data #<CSV-READER LINE-IDX:0 CHARACTER-LINE-IDX:8 CHARACTER-IDX:8 "2021," {1009727C23}> E
2022-07-02T17:16:32.208010-07:00 LOG report summary reset
             table name     errors       rows      bytes      total time
-----------------------  ---------  ---------  ---------  --------------
                  fetch          0          0                     0.000s
            before load          0          2                     0.048s
-----------------------  ---------  ---------  ---------  --------------
  "public"."test_table"          2          0                     0.016s
-----------------------  ---------  ---------  ---------  --------------
        Files Processed          0          1                     0.008s
COPY Threads Completion          0          2                     0.016s
-----------------------  ---------  ---------  ---------  --------------
      Total import time          2          0                     0.024s
  • [x] data that is being loaded, if relevant

No data is loaded.

  • [x] How the data is different from what you expected, if relevant

I expect the newline in the header to be processed without causing an error.

infojunkie avatar Jul 03 '22 00:07 infojunkie