moqui-framework icon indicating copy to clipboard operation
moqui-framework copied to clipboard

Issue with CSV Parsing for Embedded Quotes Using EntityDataLoader

Open puru-khedre opened this issue 6 months ago • 0 comments

When using EntityDataLoader to import CSV files, I encountered issues with CSV values containing double quotes ("). The error encountered was:

{'message':'IOException reading next record: java.io.IOException: (line 3) invalid char between encapsulated token and delimiter (line 3) invalid char between encapsulated token and delimiter','errorName':'Internal Server Error','error':500,'path':'/apps/tools/Entity/DataImport/load'}

Example CSV Data:

co.example.bi.fact.OrderItemFulfillmentFact
orderId,orderItemSeqId,externalId,orderName,orderTypeId,productStoreId,salesChannelEnumId,entryDate,orderDate,shippingCharges,productId,itemDescription,
FAO10117,101,5669763023132,"#1010101240",SALES_ORDER,STORE,POS_SALES_CHANNEL,1705232954323,1705232896000,,10016,"\"And\" Pride Tank in Grey Mix",

Current Behavior:

The parser throws an IOException due to embedded quotes in the data.

Proposed Solution:

To handle this, I found that using the withEscape method of CSVFormat helps manage escape characters effectively

escapeSeq = '\\'
CSVFormat format = CSVFormat.newFormat(edli.csvDelimiter)
        .withCommentMarker(edli.csvCommentStart)
        .withQuote(edli.csvQuoteChar)
        .withSkipHeaderRecord(true) // TODO: remove this? does it even do anything?
        .withIgnoreEmptyLines(true)
        .withIgnoreSurroundingSpaces(true)

format = format.withEscape(escapeSeq) // Added escape character support
CSVParser parser = format.parse(reader)

If my proposed solution is good, I'd be happy to create a pull request for it

puru-khedre avatar Aug 16 '24 13:08 puru-khedre