splitargs icon indicating copy to clipboard operation
splitargs copied to clipboard

Fails on nested+escaped quotes

Open fantasyui-com opened this issue 5 years ago • 4 comments

create door with inscription "user beware: \"temet nosce meow\""

result [ 'create', 'door', 'with', 'inscription', 'user beware: temet', 'nosce', 'meow' ]

expected [ 'create', 'door', 'with', 'inscription', 'user beware: "temet nosce meow"' ]

fantasyui-com avatar Aug 10 '19 22:08 fantasyui-com

I confirm it. Thanks for reporting the bug. I will investigate and fix.

elgs avatar Aug 10 '19 22:08 elgs

more information

var i5 = create door with inscription "user beware: \\"temet nosce meow\\""; var o5 = splitargs(i5); console.log(o5)

returns

[ 'create', 'door', 'with', 'inscription', 'user beware: \\temet', 'nosce', 'meow\\' ]

should return

[ 'create', 'door', 'with', 'inscription', 'user beware: \\"temet nosce meow\\"' ]

or maybe

[ 'create', 'door', 'with', 'inscription', 'user beware: "temet nosce meow"' ]

this is on node -v v10.16.0

working solution (also attached: foo.txt)

#!/usr/bin/env -S node --experimental-modules var splitargs = require('splitargs');

var i5 = create door with inscription "user beware: \\"temet nosce meow\\""; var o5 = splitargs(i5); console.log(o5)

const regex = /(?=\S)[^"\s](?:"[^\"](?:\[\s\S][^\"])"[^"\s])/g; const str = i5; const result = []; let m; while ((m = regex.exec(str)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; }

// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
    console.log(`Found match, group ${groupIndex}: ${match}`);
    result.push(match)
});

}

console.log(result) [ 'create', 'door', 'with', 'inscription', '"user beware: \\"temet nosce meow\\""' ]

see https://regex101.com/r/EbmqWJ/1 for regex experiment and original author is here: https://stackoverflow.com/questions/4031900/split-a-string-by-whitespace-keeping-quoted-segments-allowing-escaped-quotes/40120309#40120309

Thanks for the prompt response.

fantasyui-com avatar Aug 10 '19 22:08 fantasyui-com

I thought about it some more and I think it may be best to use https://pegjs.org/ for this. Take a look at this line in a peg based JSON parser: https://github.com/pegjs/pegjs/blob/0b102d29a86254a50275b900706098aeca349740/examples/json.pegjs#L116

What I wrote above works, but the regular expression with look ahead assertions kind of sucks to understand and there maybe more unintended behaviour with nested nestings or a need for multiple quotes (as in `, ', ") in an n-th nesting, and just in general parsing anything that demands nesting should be done with a recursive regex engine like peg. In the end peg is better because the code readability is good compared to: /(?=\S)[^"\s](?:"[^"](?:\[\s\S][^"])"[^"\s])/g; The next twat to complain about your code will probably arrive in another three years and reading /(?=\S)[^"\s](?:"[^"](?:\[\s\S][^"])"[^"\s])/g; will suck thrice as bad.

Anyway, I hope you will set off on the peg parser learning adventure, it is hard at first; but the power to create languages is very rewarding.

fantasyui-com avatar Aug 11 '19 13:08 fantasyui-com

@fantasyui-com thanks for the link. The problem seems to be harder than it seems. In the string, there seems to be no way to differentiate " from \". I think a good regex probably will work better, though it's a little hard to comprehend.

elgs avatar Aug 13 '19 17:08 elgs