moqui-framework
moqui-framework copied to clipboard
Issue with CSV Parsing for Embedded Quotes Using EntityDataLoader
When using EntityDataLoader
to import CSV files, I encountered issues with CSV values containing double quotes ("
). The error encountered was:
{'message':'IOException reading next record: java.io.IOException: (line 3) invalid char between encapsulated token and delimiter (line 3) invalid char between encapsulated token and delimiter','errorName':'Internal Server Error','error':500,'path':'/apps/tools/Entity/DataImport/load'}
Example CSV Data:
co.example.bi.fact.OrderItemFulfillmentFact
orderId,orderItemSeqId,externalId,orderName,orderTypeId,productStoreId,salesChannelEnumId,entryDate,orderDate,shippingCharges,productId,itemDescription,
FAO10117,101,5669763023132,"#1010101240",SALES_ORDER,STORE,POS_SALES_CHANNEL,1705232954323,1705232896000,,10016,"\"And\" Pride Tank in Grey Mix",
Current Behavior:
The parser throws an IOException due to embedded quotes in the data.
Proposed Solution:
To handle this, I found that using the withEscape
method of CSVFormat
helps manage escape characters effectively
escapeSeq = '\\'
CSVFormat format = CSVFormat.newFormat(edli.csvDelimiter)
.withCommentMarker(edli.csvCommentStart)
.withQuote(edli.csvQuoteChar)
.withSkipHeaderRecord(true) // TODO: remove this? does it even do anything?
.withIgnoreEmptyLines(true)
.withIgnoreSurroundingSpaces(true)
format = format.withEscape(escapeSeq) // Added escape character support
CSVParser parser = format.parse(reader)
If my proposed solution is good, I'd be happy to create a pull request for it