splitargs
splitargs copied to clipboard
Fails on nested+escaped quotes
create door with inscription "user beware: \"temet nosce meow\""
result [ 'create', 'door', 'with', 'inscription', 'user beware: temet', 'nosce', 'meow' ]
expected [ 'create', 'door', 'with', 'inscription', 'user beware: "temet nosce meow"' ]
I confirm it. Thanks for reporting the bug. I will investigate and fix.
more information
var i5 = create door with inscription "user beware: \\"temet nosce meow\\""
;
var o5 = splitargs(i5);
console.log(o5)
returns
[ 'create', 'door', 'with', 'inscription', 'user beware: \\temet', 'nosce', 'meow\\' ]
should return
[ 'create', 'door', 'with', 'inscription', 'user beware: \\"temet nosce meow\\"' ]
or maybe
[ 'create', 'door', 'with', 'inscription', 'user beware: "temet nosce meow"' ]
this is on node -v v10.16.0
working solution (also attached: foo.txt)
#!/usr/bin/env -S node --experimental-modules var splitargs = require('splitargs');
var i5 = create door with inscription "user beware: \\"temet nosce meow\\""
;
var o5 = splitargs(i5);
console.log(o5)
const regex = /(?=\S)[^"\s](?:"[^\"](?:\[\s\S][^\"])"[^"\s])/g; const str = i5; const result = []; let m; while ((m = regex.exec(str)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; }
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
result.push(match)
});
}
console.log(result) [ 'create', 'door', 'with', 'inscription', '"user beware: \\"temet nosce meow\\""' ]
see https://regex101.com/r/EbmqWJ/1 for regex experiment and original author is here: https://stackoverflow.com/questions/4031900/split-a-string-by-whitespace-keeping-quoted-segments-allowing-escaped-quotes/40120309#40120309
Thanks for the prompt response.
I thought about it some more and I think it may be best to use https://pegjs.org/ for this. Take a look at this line in a peg based JSON parser: https://github.com/pegjs/pegjs/blob/0b102d29a86254a50275b900706098aeca349740/examples/json.pegjs#L116
What I wrote above works, but the regular expression with look ahead assertions kind of sucks to understand and there maybe more unintended behaviour with nested nestings or a need for multiple quotes (as in `, ', ") in an n-th nesting, and just in general parsing anything that demands nesting should be done with a recursive regex engine like peg. In the end peg is better because the code readability is good compared to: /(?=\S)[^"\s](?:"[^"](?:\[\s\S][^"])"[^"\s])/g; The next twat to complain about your code will probably arrive in another three years and reading /(?=\S)[^"\s](?:"[^"](?:\[\s\S][^"])"[^"\s])/g; will suck thrice as bad.
Anyway, I hope you will set off on the peg parser learning adventure, it is hard at first; but the power to create languages is very rewarding.
@fantasyui-com thanks for the link. The problem seems to be harder than it seems. In the string, there seems to be no way to differentiate "
from \"
. I think a good regex probably will work better, though it's a little hard to comprehend.