node-csv icon indicating copy to clipboard operation
node-csv copied to clipboard

csv-stringify doesn't automatically quote strings with \r line endings

Open denexapp opened this issue 4 years ago • 6 comments

Describe the bug

The csv-stringify package doesn't treat \r symbol as a line ending. So it doesnt automatically quotes strings with this line ending. It may lead to unexpected behavior when other apps read generated csv's.

I use csv-stringify 5.5.0

To Reproduce

This is an example to reproduce the issue. I haven't tested it but it should work

const stringify = require('csv-stringify/lib/sync')

const columns = [{ key: 'testColumn' }]
const records = [{ testColumn: 'test \r test' }]

const data = stringify(records, {
  eof: false,
  header: true,
  delimiter: '\t',
  recordDelimeter: 'unix',
  columns,
  // uncomment next line for a workaround
  // quotedMatch: '\r'   
})

console.log(data)

Produced tsv will have a record with an unqouted value with line endings

Additional context

  • This \r line ending is not common but it exists. It is supported by code editors and apps
  • There's a workaround in the example above

denexapp avatar Aug 09 '21 13:08 denexapp

Agreed that \r is a common delimiter, it got its own shortcut in csv-stringify as mac, just like windows, unix and a few others. However, I am not in favor of giving it a special treatment by surrounded a field with quote when it contains the character. A record delimiter can be any character, a single or multiple characters, and we shall treat them all as equal.

In your example, you generated a unix file, with \n, there is no reason to treat \n as special. Note, this argument is from a library standpoint with the default options. In such case, your solution is appropriate to me.

If this is a common use case, we could create a new option just for this and leave the default behavior. This is the first time the question is raised. We will need some backup argument to support the feature. TSV not handling the character is a good one if we prove the behavior to be true.

wdavidw avatar Aug 11 '21 22:08 wdavidw

I see your point

I used to think that the library quotes all strings with windows and unix delimeters no matter what record_delimiter is. I did some testing and it actually quotes strings with characters from record_delimeter only.

I guess, I haven't had this issue before because Windows's \r\n delimeter contains Unix's \n delimeter, so it quoted strings both with Windows and Unix line endings, until our user tried to pass a text with old Mac endings. I think this is why other users of this library never reported this issue before, since mac line endings is much less common than windows and unix ones, which work nice with default settings.

However

As a developer I'd like my users to have a better experience. I have no control over the apps they use to see and parse CSV/TSV but I want them to see files correctly. Sadly, I can not specify what style of line ending a csv file uses inside the file. So an app takes a guess what line ending style is being used and in my case that guess is wrong. So I have to quote all strings with \r' and \nsymbols. And I have to usequotedMatch: '\r'` everytime I generate csv.

So if I have to pass this option everytime, maybe it's better for the library to have this behavior by default? It is defenitely not the most common use case, but I think it's a pitfall and other users may fall into it without noticing. And I think it's better to have a protection from this pitfall by default, with a way to override or turn this behavior off.

denexapp avatar Aug 17 '21 19:08 denexapp

Adding to this, I had a downstream consumer of one of my projects hit this issue. My use case is exporting data from a DB into CSV/TSV. Interestingly, it seems that csv-stringify did quote \r by default in at least 1.1.2. This behavior was then changed at some point so that it no longer quotes. My test code is:

const csvStringify = require('csv-stringify/lib/sync');
console.log(csvStringify([['a', 'b\rc', 'd']], { delimiter: '\t', eof: false }));

On 1.1.2 the output is 'a\t"b\rc"\td' and on 5.5.0 it is 'a\tbc\r\td'.

We will need some backup argument to support the feature. TSV not handling the character is a good one if we prove the behavior to be true.

Leaving \r unquoted in TSV or CSV seem to produce a broken TSV / CSV document. Attempting to import the unquoted version into Google Sheets or Numbers (I do not have Excel so cannot test that) causes the element following the \r to end up bleeding onto a new row, as opposed to the data being stored on a single line as expected (e.g. the above data ends up looking like:

a	b
c	d

as opposed to:

a	b\rc	d

MasterOdin avatar Sep 24 '21 03:09 MasterOdin

If your line ending is not \r such as the default \n, then there is no reason to quote fields containing \r by default. It is treated as any normal character. The solutions I can think of are to quote all fields or creating a new option dedicated to this.

wdavidw avatar Oct 21 '21 18:10 wdavidw

Given these two files (produced using above code example): unquoted_cr.csv (using quotedMatch: '\r') quoted_cr.csv (produced with default settings)

How do they open on Windows / Linux in popular spreadsheet software? On my machine (macOS), the quoted_cr.csv opens the data as [['a', 'b\rc', 'd']] export, and the unquoted_cr.csv opens as [['a', 'b'], ['c', 'd']]. This is consistent across Excel on Microsoft 365, Google Sheets, and Numbers.

MasterOdin avatar Oct 21 '21 23:10 MasterOdin

We never fully addressed the format expected by spreadsheets. This shall probably go through the introduction of a new option and it is currently achieved by combining multiple existing options.

wdavidw avatar Oct 22 '21 09:10 wdavidw