node-csv-parse icon indicating copy to clipboard operation
node-csv-parse copied to clipboard

trim() doesn't remove all whitespace characters

Open sedenardi opened this issue 4 years ago • 1 comments

Describe the bug

In the documentation for trim(), it's mentioned that the function will trim all whitespace around delimiters identically to \s:

The characters interpreted as whitespaces are identical to the \s meta character in regular expressions

However, in practice, and inspecting the source of __isCharTrimable(), I can see that it only is trimming the 5 examples characters from the docs, rather than the full list of characters that \s actually matches on. In the MDN Javascript docs, as well as in practice (in the example below in NodeJS), some whitespace characters should be expected to be matched and removed, but aren't.

To Reproduce

This example shows that the values containing whitespace characters not listed in the trim() docs aren't getting removed, but they are when using the regex \s.

const csvParse = require('csv-parse');

/**
 * First whitespace is 'IDEOGRAPHIC SPACE' (U+3000)
 * Second whitespace is 'HAIR SPACE' (U+200A)
 */
const input = `ID,First,Last
1234, John,Smith
5678,Jane, Doe`;

csvParse(input, {
  columns: true,
  trim: true
}, (err, res) => {
  if (err) { throw err; }
  console.log(res);
  const trimmed = res.map((r) => {
    return {
      ...r,
      First: r.First.replace(/^\s+|\s+$/g, ''),
      Last: r.Last.replace(/^\s+|\s+$/g, '')
    };
  });
  console.log(trimmed);
});

Additional context

This isn't a case where the code is necessarily broken or incorrect, as it's correctly matching the 5 whitespace characters mentioned in the docs. However, the line in the docs

identical to the \s meta character in regular expressions

incorrectly describes what the output will be.

sedenardi avatar Dec 21 '20 22:12 sedenardi

this was affecting me as well. I resolved it with my own trim but then found this.

gillyspy avatar Jun 27 '21 17:06 gillyspy