The trim modifier should remove whitespace from isspace(3) not just the space character
The function xo_trim_ws should remove white space according to isspace(3), not just space characters (the fix is not straightforward, so reporting it as a bug).
For example, if passing tab characters (the white space is just a literal tab):
$ xo "{t:description}" " squash this! " --libxo json
"description":" squash this! "
Related to #101.
Trimming space and tab makes sense, but I'm not sure isspace() is suitable. Does trimming '\r' and '\n' give a suitable behavior?
Thanks, Phil
Trimming space and tab makes sense, but I'm not sure isspace() is suitable. Does trimming '\r' and '\n' give a suitable behavior?
Ideally, yes.
My original thinking was when trimming user-input data (e.g., a description field). Focusing solely on JSON, for example, I would expect it to behave just like ES5's String.prototype.trim(), where isspace(3) was a good first start. However, after checking out the development branch (I'll close #101 by the way), I would even consider whitespace all these:
https://github.com/v8/v8/blob/f8b4967b71b8d74f692ff41fbfd8f3edff5a5b23/test/webkit/string-trim.js#L33-L56
const whitespace = [
{s: '\u0009', t: 'HORIZONTAL TAB'},
{s: '\u000A', t: 'LINE FEED OR NEW LINE'},
{s: '\u000B', t: 'VERTICAL TAB'},
{s: '\u000C', t: 'FORMFEED'},
{s: '\u000D', t: 'CARRIAGE RETURN'},
{s: '\u0020', t: 'SPACE'},
{s: '\u00A0', t: 'NO-BREAK SPACE'},
{s: '\u2000', t: 'EN QUAD'},
{s: '\u2001', t: 'EM QUAD'},
{s: '\u2002', t: 'EN SPACE'},
{s: '\u2003', t: 'EM SPACE'},
{s: '\u2004', t: 'THREE-PER-EM SPACE'},
{s: '\u2005', t: 'FOUR-PER-EM SPACE'},
{s: '\u2006', t: 'SIX-PER-EM SPACE'},
{s: '\u2007', t: 'FIGURE SPACE'},
{s: '\u2008', t: 'PUNCTUATION SPACE'},
{s: '\u2009', t: 'THIN SPACE'},
{s: '\u200A', t: 'HAIR SPACE'},
{s: '\u3000', t: 'IDEOGRAPHIC SPACE'},
{s: '\u2028', t: 'LINE SEPARATOR'},
{s: '\u2029', t: 'PARAGRAPH SEPARATOR'},
{s: '\u200B', t: 'ZERO WIDTH SPACE (category Cf)'},
];
That would be locale-based assumably, and isspace() handles that:
In the "C" locale, isspace() returns non-zero for these characters only. The value
of the argument must be representable as an unsigned char or the value of EOF.
The isspace_l() function takes an explicit locale argument, whereas the isspace()
function uses the current global or per-thread locale.
But even this would require that I build codepoints to pass to isspace(), rather than the current byte-by-byte means.
Even so, this should be doable.
Thanks, Phil