marklogic-contentpump
marklogic-contentpump copied to clipboard
Save/export rows that failed ingest due to Delimited Text Ingest Fails on Unescaped Quotes
We're encountering a similar issue to https://github.com/marklogic/marklogic-contentpump/issues/68 for files that are tab delimited but with unescaped quotes:
Sample:
11:16:43.614 [pool-1-thread-1] WARN c.m.contentpump.DocumentMapper - Skipped record: () in file:/homes/local/projects/data-hub/data/omop/all/CONCEPT/CONCEPT.csv at line 1999360, reason: invalid char between encapsulated token and delimiter
02020201 "opt out" service Observation DOMAIN DOMAIN
It would be great if we could get the failed records in a separate file or in the log so we could examine quickly what went wrong during the ingest and see what kind of formatting error we have and fix it.
- Steps to reproduce the bug - ingest as tab delimited file with value:
02020201
"opt out" service Observation DOMAIN DOMAIN - Input and Output -
Sample output:
11:16:43.614 [pool-1-thread-1] WARN c.m.contentpump.DocumentMapper - Skipped record: () in file:/homes/local/projects/data-hub/data/omop/all/CONCEPT/CONCEPT.csv at line 1999360, reason: invalid char between encapsulated token and delimiter
- Environment - RedHat, MarkLogic 9.0-3, MLCP 9.0-4
- Suggest a fix - save the skipped lines in a separate file or log so we can inspect what kind of formatting error is encountered