PapaParse icon indicating copy to clipboard operation
PapaParse copied to clipboard

Missing quotes when parsing.

Open dilizarov opened this issue 6 years ago • 4 comments

I've got the following two pieces of data in Excel:

Hi - need the forecasting documents for ColorTokens- they do not have a master payer, but want to do an EDP. Not sure if they need the master payer to do the EPD. Here are their IDs: 096742272934 070181106296 244882251506 602341081844 664991688287 681262709514 816618564346 822256776645 Deferred to CE Cost Optimization

and

"This is" is not a movie"

When you copy this data from Excel, it wraps the first big block in quotes and this is the string data I see:

"Hi - need the forecasting documents for ColorTokens- they do not have a master payer, but want to do an EDP. Not sure if they need the master payer to do the EPD.

Here are their IDs: 096742272934 070181106296 244882251506 602341081844 664991688287 681262709514 816618564346 822256776645

Deferred to CE Cost Optimization" "This is" is not a movie"

JSON.stringify shows this: ""Hi - need the forecasting documents for ColorTokens- they do not have a master payer, but want to do an EDP. Not sure if they need the master payer to do the EPD. \n\nHere are their IDs: \n096742272934\n070181106296\n244882251506\n602341081844\n664991688287\n681262709514\n816618564346\n822256776645\n\nDeferred to CE Cost Optimization"\t"This is" is not a movie""

The issue is when I take this data and run it through Papaparse like such: Papa.parse(data, { delimiter: "\t" }), the surrounding quotes are dropped.

This makes sense for the larger block of text - Excel wrapped the larger block in quotes for the CSV to state that it is one cell's contents.

For the smaller text though, we now see This is" is not a movie, which is incorrect. Is there a config to resolve this?

When I try escaping the first and last quote in my strings, then the problem is solved for the smaller text and I see the quotes, but then the larger block of text begins with " and ends with " in the output as well which isn't right.

dilizarov avatar Oct 02 '19 23:10 dilizarov

Is that snapshot of the stringify correct or is all of that inside single quotes instead of double quotes? It does not look valid for CSV in either case that I look at it since it isn't properly escaping the quote characters in the field. It should be something more like var data = '"Hi...\n..."\t"""This is"" is not a movie"'. Since the quotes are not escaped by doubling them it is malformed and the correct solution is to report an error. I think the change I am working on would cause it to return '"this is" is not a movie"' and report an error.

jseter avatar Oct 23 '19 19:10 jseter

@jseter which change will cuase to report this as errror?

pokoli avatar Oct 24 '19 06:10 pokoli

Without any changes today, this is what I see from the example provided.

{ 
    data: [ [ 'Hi - need the forecasting documents for ColorTokens- they do not have a master payer, but want to do an EDP. Not sure if they need the master payer to do the EPD. \n\nHere are their IDs: \n096742272934\n070181106296\n244882251506\n602341081844\n664991688287\n681262709514\n816618564346\n822256776645\n\nDeferred to CE Cost Optimization', 'This is" is not a movie' ] ],
  errors: [ { 
      type: 'Quotes',
      code: 'InvalidQuotes',
      message: 'Trailing quote on quoted field is malformed',
      row: 0,
      index: 327
  } ],
  meta:
   { delimiter: '\t',
     linebreak: '\n',
     aborted: false,
     truncated: false,
     cursor: 351 } }

So even without changes, it it properly reporting that the data is malformed.

The change I mentioned is in-progress which is the "strictQuote" option that we deferred to a be a separate PR. It would still report an error here but it would treat the invalid quoted field as a non-quoted field.

jseter avatar Oct 24 '19 12:10 jseter

Ah... so Excel is at fault in this case. I suppose I would then have to parse through the data myself and see where the invalid quotes are and handle them myself.

dilizarov avatar Oct 25 '19 07:10 dilizarov