marklogic-contentpump icon indicating copy to clipboard operation
marklogic-contentpump copied to clipboard

Save/export rows that failed ingest due to Delimited Text Ingest Fails on Unescaped Quotes

Open janmichaelyu opened this issue 6 years ago • 1 comments

We're encountering a similar issue to https://github.com/marklogic/marklogic-contentpump/issues/68 for files that are tab delimited but with unescaped quotes:

Sample:

11:16:43.614 [pool-1-thread-1] WARN  c.m.contentpump.DocumentMapper - Skipped record: () in file:/homes/local/projects/data-hub/data/omop/all/CONCEPT/CONCEPT.csv at line 1999360, reason: invalid char between encapsulated token and delimiter

02020201 "opt out" service Observation   DOMAIN   DOMAIN 

It would be great if we could get the failed records in a separate file or in the log so we could examine quickly what went wrong during the ingest and see what kind of formatting error we have and fix it.

janmichaelyu avatar Jul 12 '18 21:07 janmichaelyu

  • Steps to reproduce the bug - ingest as tab delimited file with value: 02020201 "opt out" service Observation DOMAIN DOMAIN
  • Input and Output - Sample output: 11:16:43.614 [pool-1-thread-1] WARN c.m.contentpump.DocumentMapper - Skipped record: () in file:/homes/local/projects/data-hub/data/omop/all/CONCEPT/CONCEPT.csv at line 1999360, reason: invalid char between encapsulated token and delimiter
  • Environment - RedHat, MarkLogic 9.0-3, MLCP 9.0-4
  • Suggest a fix - save the skipped lines in a separate file or log so we can inspect what kind of formatting error is encountered

janmichaelyu avatar Jul 12 '18 21:07 janmichaelyu