Avoid match looping in sourceMappingURL tag detection by using negative lookahead
I think that a single regex is better than looping a regex multiple times. The code is smaller, using const and such.
I can't disagree with code simplicity, do you have benchmarks? I mean, regexp is not the nicest thing to maintain
I can't disagree with code simplicity, do you have benchmarks?
I didn't run any benchmark. Going to do a PoC with iso-bench and post the result.
I mean, regexp is not the nicest thing to maintain
Yeah, that's true... 😔
EDIT: Invalid test. Rewriting...
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 90.07%. Comparing base (
5fe7800) to head (78040e3). Report is 37 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #58704 +/- ##
==========================================
- Coverage 90.16% 90.07% -0.09%
==========================================
Files 637 640 +3
Lines 188001 188287 +286
Branches 36881 36915 +34
==========================================
+ Hits 169509 169602 +93
- Misses 11238 11400 +162
- Partials 7254 7285 +31
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Done. Summary:
- A bit of improvement on single comments, and equalizing the more lines the file has (from
1.16xto1.00x). - A lot of worsening on multiple comments, but getting a bit better the more lines the file has (from
17.82xto14.49x).
I'm not sure the amount of files with multiple comments that we are going to find. Still, can process from thousands to millions of files per second.
I agree that this is making the regex more difficult to read, though. Negative lookaheads are tricky.
I don't know how do you feel about this. I don't event know how I feel about this. In non-performance-critical places I prefer simpler code, and I'm not sure if processing thousands of sourcemaps per second is considered low performance in this context. I don't know how many projects with files with 2000+ lines and multiple sourcemap comments are around there. Still, if the project has 500 files like that (wich are a lot in my opinion), the performance difference is not going to be noticeable I think.
Benchmark code:
const { IsoBench } = require("iso-bench");
const negativeLookaheadRegex = /(\/[*/]#\s+sourceMappingURL=)(?<sourceMappingURL>[^\s]+)(?![\s\S]*\1)/;
const multipleMatchRegex = /\/[*/]#\s+sourceMappingURL=(?<sourceMappingURL>[^\s]+)/g;
const singleComment = {
"2lines": `test string yay ok
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"20lines": `${`test string yay ok\n`.repeat(19)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"200lines": `${`test string yay ok\n`.repeat(199)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"2000lines": `${`test string yay ok\n`.repeat(1990)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
const doubleComment = {
"2lines": `//# sourceMappingURL=g44tgbh463h643h6
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"20lines": `${new Array(19).fill(0).map((_, i) => {
return i % 10 === 9 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"200lines": `${new Array(199).fill(0).map((_, i) => {
return i % 100 === 99 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"2000lines": `${new Array(1999).fill(0).map((_, i) => {
return i % 1000 === 999 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
function matchNegativeLookahead(code) {
const match = negativeLookaheadRegex.exec(code);
if (match == null || !match.groups) {
return null;
}
return match.groups.sourceMappingURL;
}
function matchLoop(code) {
let match;
let lastMatch;
while ((match = multipleMatchRegex.exec(code))) {
lastMatch = match;
}
if (lastMatch == null || !lastMatch.groups) {
return null;
}
return lastMatch.groups.sourceMappingURL;
}
function add(bench, code) {
return bench.add("negative lookahead", () => {
return matchNegativeLookahead(code);
}).add("match loop", () => {
return matchLoop(code);
});
}
const bench = new IsoBench();
add(bench, singleComment["2lines"]).endGroup("Single comment: 2 lines");
add(bench, singleComment["20lines"]).endGroup("Single comment: 20 lines");
add(bench, singleComment["200lines"]).endGroup("Single comment: 200 lines");
add(bench, singleComment["2000lines"]).endGroup("Single comment: 2000 lines");
add(bench, doubleComment["2lines"]).endGroup("Multiple comment: 2 lines");
add(bench, doubleComment["20lines"]).endGroup("Multiple comment: 20 lines");
add(bench, doubleComment["200lines"]).endGroup("Multiple comment: 200 lines");
add(bench, doubleComment["2000lines"]).endGroup("Multiple comment: 2000 lines");
bench.consoleLog().run();
Result:
Ok, I tried with different approaches, and I think that I found a good balance between performance an code, as performance is good (even better with single and multiple comment lines), has less code, uses const and avoids explicit loops.
I tested:
matchNegativeLookahead: Using a negative lookahead withconst.matchLoop: The original one usingexecloops andlet.matchReduce: UsingmatchAllandreduceto get the last element withconst.matchArray: UsingmatchAll,Array.fromandatto get the last element withconst.matchGlobal: Using a globalmatchandatto get the last element withconst, and thenindexOftosubstringthe URL part.
Personally, matchGlobal is the one I like the most, as has no explicit loops, less code, uses const and has a good performance.
If the reviewer is OK with this, I can push a commit with the matchGlobal code.
EDIT: Added "no comment" benchmarks.
Benchmark code:
const { IsoBench } = require("iso-bench");
const negativeLookaheadRegex = /(\/[*/]#\s+sourceMappingURL=)(?<sourceMappingURL>[^\s]+)(?![\s\S]*\1)/;
const globalMatch = /\/[*/]#\s+sourceMappingURL=[^\s]+/g;
const multipleMatchRegex = /\/[*/]#\s+sourceMappingURL=(?<sourceMappingURL>[^\s]+)/g;
const noComment = {
"2lines": "test string yay ok\n".repeat(2),
"20lines": "test string yay ok\n".repeat(20),
"200lines": "test string yay ok\n".repeat(200),
"2000lines": "test string yay ok\n".repeat(2000)
};
const singleComment = {
"2lines": `test string yay ok
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"20lines": `${"test string yay ok\n".repeat(19)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"200lines": `${"test string yay ok\n".repeat(199)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"2000lines": `${"test string yay ok\n".repeat(1990)}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
const doubleComment = {
"2lines": `//# sourceMappingURL=g44tgbh463h643h6
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"20lines": `${new Array(19).fill(0).map((_, i) => {
return i === 10 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"200lines": `${new Array(199).fill(0).map((_, i) => {
return i === 100 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`,
"2000lines": `${new Array(1999).fill(0).map((_, i) => {
return i === 1000 ? `//# sourceMappingURL=g44tgbh463h643h6${i}` : `test string yay ok ${i}`;
}).join("\n")}}
//# sourceMappingURL=erfgasdfgasdfgasdgasdg`
};
// Test functions
function matchNegativeLookahead(code) {
const match = negativeLookaheadRegex.exec(code);
if (match == null || !match.groups) {
return null;
}
return match.groups.sourceMappingURL;
}
function matchLoop(code) {
let match;
let lastMatch;
while ((match = multipleMatchRegex.exec(code))) {
lastMatch = match;
}
if (lastMatch == null || !lastMatch.groups) {
return null;
}
return lastMatch.groups.sourceMappingURL;
}
function matchReduce(code) {
const match = code.matchAll(multipleMatchRegex).reduce((_, match) => match, null);
if (match == null || !match.groups) {
return null;
}
return match.groups.sourceMappingURL;
}
function matchArray(code) {
const match = Array.from(code.matchAll(multipleMatchRegex)).at(-1);
if (match == null || !match.groups) {
return null;
}
return match.groups.sourceMappingURL;
}
function matchGlobal(code) {
const match = code.match(globalMatch)?.at(-1);
if (match == null) {
return null;
}
return match.substring(match.indexOf("=") + 1);
}
// End test functions
function add(bench, code) {
return bench.add("match loop (original)", () => {
return matchLoop(code);
}).add("match global (new one I like)", () => {
return matchGlobal(code);
}).add("negative lookahead (meh on multiline)", () => {
return matchNegativeLookahead(code);
}).add("match reduce (meh)", () => {
return matchReduce(code);
}).add("match array (meh)", () => {
return matchArray(code);
});
}
const bench = new IsoBench();
add(bench, singleComment["2lines"]).endGroup("Single comment: 2 lines");
add(bench, singleComment["20lines"]).endGroup("Single comment: 20 lines");
add(bench, singleComment["200lines"]).endGroup("Single comment: 200 lines");
add(bench, singleComment["2000lines"]).endGroup("Single comment: 2000 lines");
add(bench, doubleComment["2lines"]).endGroup("Multiple comment: 2 lines");
add(bench, doubleComment["20lines"]).endGroup("Multiple comment: 20 lines");
add(bench, doubleComment["200lines"]).endGroup("Multiple comment: 200 lines");
add(bench, doubleComment["2000lines"]).endGroup("Multiple comment: 2000 lines");
add(bench, noComment["2lines"]).endGroup("No comment: 2 lines");
add(bench, noComment["20lines"]).endGroup("No comment: 20 lines");
add(bench, noComment["200lines"]).endGroup("No comment: 200 lines");
add(bench, noComment["2000lines"]).endGroup("No comment: 2000 lines");
bench.consoleLog().run();
Benchmark results:
I'm not sure what subsystem this applies to...
I think maybe src: is fine.
Could also use process:
If you ever are unsure, i recommend looking at what other PRs used that previously modified this code.
Ok, I'm going with lib as it was the very previous commit for these lines that I'm changing.
@legendecas I've modified the eval test. Do you think that's enough or you prefer a single file specific for this?
@legendecas I noticed a problem with the negative lookahead regex. If you have mixed //# sourceMappingURL= and /* sourceMappingURL= comments it will fail, so instead of explaining the regexes, I went by simplifying the regexes and using the matchGlobal explained here https://github.com/nodejs/node/pull/58704#issuecomment-2972484345 which doesn't have that problem.
It also has more performance than the negative lookahead and simplified both the negative lookahead regex and the original exec loop code, so I think that is a better approach. What do you think?
Ok, at this point I don't know what is happening. A linter about prototype pollution fails with StringPrototypeMatch and I'm not sure why as I'm using the primordial like all the other StringPrototypeX.
Ok, at this point I don't know what is happening. A linter about prototype pollution fails with
StringPrototypeMatchand I'm not sure why as I'm using the primordial like all the otherStringPrototypeX.
Checkout https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match#description for explaination about the prototype pollution (i.e. RegExp.prototype[Symbol.match]() in this case). An alternative to StringPrototypeMatch is using RegExpPrototypeSymbolMatch directly
Checkout https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match#description for explaination about the prototype pollution (i.e.
RegExp.prototype[Symbol.match]()in this case). An alternative toStringPrototypeMatchis usingRegExpPrototypeSymbolMatchdirectly
Ahhh makes sense! Thank you.
Now it complains that RegExpPrototypeSymbolMatch looks up the exec property. I'm going to close the PR as I don't have time for this :-(