peggy
peggy copied to clipboard
Fix all uses of String.prototype.substr
See the deprecation warning on MDN.
It looks like we can substitute String.prototype.slice, including in the generated code. Think about edge cases of NaN, negative, and swapped start and end in all cases.
Why? Half the web depends on it, and it's often more convenient for parsers than slice
or substring
. At the very least it shouldn't block any releases.
I'm not sure why it's more convenient than slice
or substring
? Can you say more?
To check that a
contains b
starting with a certain position (with correct handling of potential end of string) is usually done as a.substr(peg$pos, b.length) === b
. The other two methods take the end position instead of length, so with no substr
it should be computed as peg$pos + b.length
for no good reason.
As I understand, substr
will be there forever in JS implementations along with ordering of fields in objects, <marquee>
tag, with
operator and quirky behavior of sort
, because
- otherwise early websites would stop operating;
- there is no graceful degradation in case it ceases to exist;
- removing it from an implementation of a JS engine produces barely any profit;
- prototypes of globals can't really change in strict mode.
TLDR I don't see how following this deprecation might improve anything.
Not sure if this was just abandoned on the grounds that substr
will never be removed, but I did some investigation.
- the only place
substr
is used now is in generated code (and one reference fromdocs/vendor/codemirror
) - if I add a helper function,
peg$substr
at the same level aspeg$currPos
that just takes a length parameter, and returnsinput.slice(peg$currPos, peg$currPos + length)
, and use that instead ofinput.substr
everywhere, the generated parser is smaller (peg$substr(5)
vsinput.substr(peg$currPos, 5)
), and I can't measure a difference in performance in any of the parsers that I've generated (and I've tried with parser/input combos that take hundreds of ms). - I wrote a small test case that sets up something comparable to the parser, and just calls my new
substr
, orinput.substr
in a loop, and there I see a very small slowdown. Maybe as much as 10%, but there's so much noise it's hard to tell (on some runs my helper is actually faster, but its slower more often than not).
In summary, it seems like switching over is easy, there's a code size win (not a big deal), and maybe a very small performance penalty, but not measurable in real cases.
Here's the test case, in case I'm making a systematic error of some sort:
const fs = require("node:fs/promises");
function wrapper(input) {
let currPos = 0;
function substr(n) {
return input.slice(currPos, currPos + n);
}
function parseOld() {
currPos = 0;
while (input.substr(currPos, 5) !== "") {
currPos += 5;
}
}
function parseNew() {
currPos = 0;
while (substr(5) !== "") {
currPos += 5;
}
}
return [parseOld, parseNew];
}
fs.readFile("<a 20k text file>").then(
(data) => {
const [parseOld, parseNew] = wrapper(data.toString());
function test(testFn, which) {
const start = Date.now();
for (let i = 0; i < 10000; i++) {
testFn();
}
const end = Date.now();
console.log(`${which} Took ${end - start}ms`);
}
test(parseOld, "parseOld");
test(parseNew, "parseNew");
}
);
I agree that substr() will live forever, but there are ways of writing "modern" JavaScript. .slice() is mirrored by Array, which makes it more consistent. You can still use "var" and "==" all you want, but I wouldn't allow it in a recent PR. I think modernizing a code-base is a good goal to have. It's just confusing to have 3 functions that basically do the same thing.
https://stackoverflow.com/questions/2243824/what-is-the-difference-between-string-slice-and-string-substring