Issue 12053- CesiumJS should not use push.apply
Description
Change push.apply to use for loops to avoid exceeding the maximum call stack.
Issue number and link
#12053
Testing plan
Author checklist
- [x] I have submitted a Contributor License Agreement
- [x] I have added my name to
CONTRIBUTORS.md - [x] I have updated
CHANGES.mdwith a short summary of my change - [x] I have added or updated unit tests to ensure consistent code coverage
- [x] I have updated the inline documentation, and included code examples where relevant
- [x] I have performed a self-review of my code
Thank you for the pull request, @Tim-Quattrochi!
:white_check_mark: We can confirm we have a CLA on file for you.
Thanks @Tim-Quattrochi!
@javagl Are you available to review?
Not to distract from the main point here, but:
the pattern that is often used in the CesiumJS source code, for exactly this reason, namely, to even pull out the length.
I have actually been changing this pattern… because:
- it is extra mental overhead when reading the code
- The pattern
for (let i = 0; i < array.length; i++)is so boilerplate that the.lengthaccessor is already optimized out (stored in a register) by modern JIT compilers (at least on every recent benchmark I have seen).
Having said that, I agree that exampleObject.getSources() should be precomputed. The compiler will not know what to do with custom getters.
@jjhembd I agree that not storing the array.length can increase readability (and avoid certain bugs). The performance was once said to be the reason for pulling it out.
Reliable benchmarks can be difficult, and the JIT is just one very large black box. E.g. does it only store that length in a register when it is actually an array, or also when length is a custom property? Or when it is a 'getter' for a custom property that was determined to not be affected by the loop...? In any case, it's a pity that the Iterable construct does not seem to be JITted. The results in https://github.com/CesiumGS/cesium/pull/11697#issuecomment-1889968947 have been pretty disappointing in that regard...
EDIT: I already mentioned in another comment that we might consider updating the coding guide to no longer recommend the const n = array.length pattern
@jjhembd I agree that not storing the
array.lengthcan increase readability (and avoid certain bugs). The performance was once said to be the reason for pulling it out.Reliable benchmarks can be difficult, and the JIT is just one very large black box. E.g. does it only store that
lengthin a register when it is actually an array, or also whenlengthis a custom property? Or when it is a 'getter' for a custom property that was determined to not be affected by the loop...? In any case, it's a pity that theIterableconstruct does not seem to be JITted. The results in #11697 (comment) have been pretty disappointing in that regard...EDIT: I already mentioned in another comment that we might consider updating the coding guide to no longer recommend the
const n = array.lengthpattern
@javagl If we were to keep the current pattern, how would you ensure it stays DRY? My thought was to create a utility function or perhaps a method, but I wanted to get your opinion. Repeating the length variable and source for each conditional seems like an anti-pattern to me.
From quickly scrolling over the changes, it seems like the inlined comments have been addressed. I'll do a (probably final) review pass early tomorrow (and just for completeness, try out the "artificial" model that I created to provoke the error).
About storing the length: (I'm not even sure whether that question was addressed at me): I don't have a strong opinion there. The main point was about storing things like the getSources() (in the example). And I mentioned the pattern of storing the length as const n = something.length; mainly because that was used in most of the codebase, and even in snippets in the coding guide. But using the something.length directly in the loop is also fine for me.
@jjhembd Do you think that new/modified code should consistently use the pattern without storing the length?
The strong(est) case that I could make against storing the length is when there are "many" arrays and loops in one method, and one has to distinguish the names (i.e. when it cannot be n, but has to be array0Length.. arrayNLength). Is this what your comment about DRY referred to? I'm not sure how a utility function/method could help here...
It would be great if these considerations could be obsolete: I think that the for (const element of array) should be the idiomatic code. (You actually used this in one place - not sure whether that was intentional). Unfortunately, JavaScript compilers/JITs don't seem to be very smart, and these loops seem to be much slower, even though they could trivially be translated into the index-based ones by the compiler or JIT...
About storing the
length: (I'm not even sure whether that question was addressed at me): I don't have a strong opinion there. The main point was about storing things like thegetSources()(in the example). And I mentioned the pattern of storing the length asconst n = something.length;mainly because that was used in most of the codebase, and even in snippets in the coding guide. But using thesomething.lengthdirectly in the loop is also fine for me.
@javagl Yes it was, sorry I '@' the wrong person. I was just referring to the fact that I have to repeat myself a lot. It was a general best practice question, not necessarily something I am introducing. For example, I have to create a variable to point at the source, create a target, create a variable to hold the length of the source. This can happen a lot in a method that has more conditions. An example is here: packages/engine/Source/Scene/Cesium3DTileBatchTable.js lines 375, 376, then again 382, 387. Would it be good practice to create a function to:
- create a reference to the source.
- create a variable for the length of the source.
- create a target.
- loop through and push.
- return the result.
Again, this is just a general question and I am not suggesting Cesium do this.
It would be great if these considerations could be obsolete: I think that the
for (const element of array)should be the idiomatic code. (You actually used this in one place
I don't believe I refactored those after changes requested. I can have them changed and pushed to this PR before you do a review, and yes I will absolutely try out the artificial model that you created.
Would it be good practice to create a function to: ...
I misunderstood that first. I thought that you referred to a function that can "always" replace such a for-loop, in the most generic form. But now I see that you only referred to the push.apply case here.
And indeed, such a function could make sense, and it could be pretty straightforward, like
function pushAll(source, target) {
for (let i=0; i<source.length; i++){
target.push(source[i]);
}
}
This would turn a code block like the one that you linked to from this
const hierarchyPropertyIdsLength = hierarchyPropertyIds.length;
for (let i = 0; i < hierarchyPropertyIdsLength; ++i) {
results.push(hierarchyPropertyIds[i]);
}
into this
pushAll(hierarchyPropertyIds, results);
One could even pre-allocate the space in the target array, to avoid repeated allocations/resize operations.
Now... I'm a bit torn:
- On the one hand, I don't want to hold up your PR, or let it become "obsolete" by suggesting to introduce such a function
- On the other hand, I'd be strongly in favor of introducing such a function
- (On the third hand, not being part of the code CesiumJS squad, I probably cannot make the final decision here)
But even at the risk of making most of the changes here obsolete, I'd vote for introducing such a function, because it could tremendously increase readability and maintanability (and performance), with little effort and bascially zero risk.
I just threw together some ""benchmark"", to compare a few things here. (I'm not a "JavaScript performance testing expert", so this has to be taken with a huge grain of salt...).
The benchmark creates arrays of different sizes, repeatedly puts the contents of these source arrays into target arrays, and prints the timing information. The approaches that are compared here are
- The
push.apply(causing errors for larger arrays) - The
...spreadoperator (also causing errors) - An index-based
forloop (the approach that is now used in this PR) - A
foreachloop (just for comparison) - An index-based
forloop with preallocated target array size, as it could be done easily and transparently(!) in such a function
(Some results are given below)
The code:
function runSingleTest(sourceSize, testFunction) {
const sourceArray = Array(sourceSize);
const targetArray = [];
const startMs = performance.now();
try {
testFunction(sourceArray, targetArray);
} catch (errror) {
return undefined;
}
const endMs = performance.now();
const durationMs = endMs - startMs;
return durationMs;
}
function runTest(runs, sourceSize, name, testFunction) {
let sumMs = 0.0;
for (r = 0; r < runs; r++) {
const ms = runSingleTest(sourceSize, testFunction);
if (ms === undefined) {
console.log(`sourceSize ${sourceSize}, name ${name}, ERROR`);
return;
}
sumMs += ms;
}
const averageMs = sumMs / runs;
console.log(`sourceSize ${sourceSize}, name ${name}, averageMs ${averageMs}`);
}
function usePushApply(sourceArray, targetArray) {
targetArray.push.apply(targetArray, sourceArray);
}
function useSpread(sourceArray, targetArray) {
targetArray.push(...sourceArray);
}
function useForEachLoop(sourceArray, targetArray) {
for (const element of sourceArray) {
targetArray.push(element);
}
}
function useForLoop(sourceArray, targetArray) {
for (let i = 0; i < sourceArray.length; i++) {
targetArray.push(sourceArray[i]);
}
}
function useForLoopPreallocated(sourceArray, targetArray) {
const t = targetArray.length;
targetArray.length += sourceArray.length;
for (let i = 0; i < sourceArray.length; i++) {
targetArray[t + i] = sourceArray[i];
}
}
function test() {
const runs = 100;
for (let n = 100; n <= 1000000; n *= 10) {
runTest(runs, n, "pushApply", usePushApply);
runTest(runs, n, "spread", useSpread);
runTest(runs, n, "forEach", useForEachLoop);
runTest(runs, n, "for", useForLoop);
runTest(runs, n, "forAlloc", useForLoopPreallocated);
}
}
try {
test();
} catch (error) {
console.log(error);
}
Some results (a summary will be given below):
On NodeJS:
sourceSize 100, name pushApply, averageMs 0.0016129999561235308
sourceSize 100, name spread, averageMs 0.006134000001475215
sourceSize 100, name forEach, averageMs 0.018598999972455205
sourceSize 100, name for, averageMs 0.007068000007420778
sourceSize 100, name forAlloc, averageMs 0.0077609999664127825
sourceSize 1000, name pushApply, averageMs 0.005562999998219311
sourceSize 1000, name spread, averageMs 0.003792000007815659
sourceSize 1000, name forEach, averageMs 0.0369120000069961
sourceSize 1000, name for, averageMs 0.02806300004012883
sourceSize 1000, name forAlloc, averageMs 0.0023279999662190677
sourceSize 10000, name pushApply, averageMs 0.019967000023461877
sourceSize 10000, name spread, averageMs 0.019638000009581446
sourceSize 10000, name forEach, averageMs 0.038648000014945866
sourceSize 10000, name for, averageMs 0.038686000015586613
sourceSize 10000, name forAlloc, averageMs 0.012039000024087728
sourceSize 100000, name pushApply, averageMs 1.8059109999565408
sourceSize 100000, name spread, averageMs 1.744411000018008
sourceSize 100000, name forEach, averageMs 1.2372270000027492
sourceSize 100000, name for, averageMs 1.229792999974452
sourceSize 100000, name forAlloc, averageMs 0.39260800000280144
sourceSize 1000000, name pushApply, ERROR
sourceSize 1000000, name spread, ERROR
sourceSize 1000000, name forEach, averageMs 21.47643099999521
sourceSize 1000000, name for, averageMs 21.594332000017165
sourceSize 1000000, name forAlloc, averageMs 4.8390970000252125
On Chrome in jsfiddle
"sourceSize 100, name pushApply, averageMs 0"
"sourceSize 100, name spread, averageMs 0"
"sourceSize 100, name forEach, averageMs 0.006999999997206032"
"sourceSize 100, name for, averageMs 0.003000000002793968"
"sourceSize 100, name forAlloc, averageMs 0.00400000000372529"
"sourceSize 1000, name pushApply, averageMs 0.002999999998137355"
"sourceSize 1000, name spread, averageMs 0.003999999999068677"
"sourceSize 1000, name forEach, averageMs 0.01600000000093132"
"sourceSize 1000, name for, averageMs 0.00400000000372529"
"sourceSize 1000, name forAlloc, averageMs 0.003999999999068677"
"sourceSize 10000, name pushApply, averageMs 0.01899999999906868"
"sourceSize 10000, name spread, averageMs 0.017999999998137353"
"sourceSize 10000, name forEach, averageMs 0.04199999999720603"
"sourceSize 10000, name for, averageMs 0.03300000000279397"
"sourceSize 10000, name forAlloc, averageMs 0.011999999997206032"
"sourceSize 100000, name pushApply, averageMs 1.2229999999841674"
"sourceSize 100000, name spread, averageMs 1.1540000000130386"
"sourceSize 100000, name forEach, averageMs 0.6920000000204891"
"sourceSize 100000, name for, averageMs 0.741999999973923"
"sourceSize 100000, name forAlloc, averageMs 0.1800000000046566"
"sourceSize 1000000, name pushApply, ERROR"
"sourceSize 1000000, name spread, ERROR"
"sourceSize 1000000, name forEach, averageMs 12.13999999998603"
"sourceSize 1000000, name for, averageMs 12.027999999970199"
"sourceSize 1000000, name forAlloc, averageMs 2.443000000016764"
On Firefox in jsfiddle:
"sourceSize 100, name pushApply, averageMs 0.01"
"sourceSize 100, name spread, averageMs 0.01"
"sourceSize 100, name forEach, averageMs 0"
"sourceSize 100, name for, averageMs 0"
"sourceSize 100, name forAlloc, averageMs 0.01"
"sourceSize 1000, name pushApply, averageMs 0.06"
"sourceSize 1000, name spread, averageMs 0.03"
"sourceSize 1000, name forEach, averageMs 0"
"sourceSize 1000, name for, averageMs 0"
"sourceSize 1000, name forAlloc, averageMs 0"
"sourceSize 10000, name pushApply, averageMs 0.36"
"sourceSize 10000, name spread, averageMs 0.33"
"sourceSize 10000, name forEach, averageMs 0.08"
"sourceSize 10000, name for, averageMs 0.05"
"sourceSize 10000, name forAlloc, averageMs 0.04"
"sourceSize 100000, name pushApply, averageMs 3.85"
"sourceSize 100000, name spread, averageMs 3.62"
"sourceSize 100000, name forEach, averageMs 0.86"
"sourceSize 100000, name for, averageMs 0.38"
"sourceSize 100000, name forAlloc, averageMs 0.34"
"sourceSize 1000000, name pushApply, ERROR"
"sourceSize 1000000, name spread, ERROR"
"sourceSize 1000000, name forEach, averageMs 9.28"
"sourceSize 1000000, name for, averageMs 5.04"
"sourceSize 1000000, name forAlloc, averageMs 5.58"
For small arrays, it does not seem to matter in terms of performance.
For arrays of size >~150000, the push.apply and ...spread cause errors, so they are out of the game anyhow. But even in the cases where they still work, they are noticably slower than the alternatives.
For large arrays, the index-based for and foreach seem to have roughly the same performance in NodeJS and Chrome. In FireFox, the foreach one is noticably slower.
The "index-based-for-with-preallocated-size" seems to consistently be the fastest (the 5.58 vs. 5.04 on FireFox is "noise"). In fact, for larger arrays on NodeJS and Chrome, this one seems to be 3-5 times faster than the standard for/push approach.
tl;dr (too late for that, I know 🙂 )
I'd second your suggestion to introduce a function like this
function pushAll(source, target) {
const t = target.length;
target.length = t + source.length;
for (let i=0; i<source.length; i++){
target[t + i] = source[i];
}
}
That can be used in all places where the contents of one array has to be added to another (i.e. at least all places that are currently affected by this PR - and probably some more, where this already was done with a manual, inlined for loop)
To be confirmed by the CesiumJS core squad.
- On the one hand, I don't want to hold up your PR
I could have emailed you about it, I was genuinely curious about your opinion on that. Not only for this project, but for my own development structure. I appreciate your thorough responses.
As long as there's not an existing method which does the equivalent (it sounds like we want the concat method, but without creating a new array each time), I see no harm in adding a helper function to the general API.
I always feel like a complete noob when I have to do searches that lead to things like https://stackoverflow.com/questions/4156101/copy-array-items-into-another-array . But ... inside that usual stack overflow noise, there are some hints that seem to confirm that push.apply/...spread are simple but problematic, concat is simple and fast (but not what we want here), and a custom utility function that preallocates the output seems to be the most versatile for what we want to do here.
So unless someone disagrees, there will probably be some function for this. This would reside on the same level as defined and defaultValue. And... similar to the latter, we can hope that it can one day be replaced by one efficient, idiomatic, built-in functionality.
We could now talk about details.... consider this as brainstorming:
- It could be called
pushAll, but... depending on the implementation, it's not "pushing", so I'd call itaddAll - It could transparently handle the case of
source===undefinedby doing nothing (?) - ...
I've only been half following this thread so if I missed the answer already then ignore me.
concat is simple and fast (but not what we want here)
Is there a reason we can't replace push.apply with concat and just reassign variables? I wouldn't expect anything this was previously used on to be anything but a pure Array so I wouldn't expect references to be broken or anything.
arr1.push.apply(arr1, arr2);
// could just be
arr1 = arr1.concat(arr2);
This probably wouldn't work in a helper function but I don't think it would even need the helper function if we went this route. With a very naive test this is faster than even the pre-allocated solution above, it just requires a slight rework to how we're using these variables and maybe converting some from const to let
sourceSize 100, name forAlloc, averageMs 0.006000000238418579
sourceSize 100, name concat, averageMs 0.0029999995231628418
sourceSize 1000, name forAlloc, averageMs 0.006000000834465027
sourceSize 1000, name concat, averageMs 0.0010000002384185792
sourceSize 10000, name forAlloc, averageMs 0.02000000059604645
sourceSize 10000, name concat, averageMs 0.005
sourceSize 100000, name forAlloc, averageMs 0.31600000500679015
sourceSize 100000, name concat, averageMs 0.1809999990463257
sourceSize 1000000, name forAlloc, averageMs 4.498000001907348
sourceSize 1000000, name concat, averageMs 3.0629999989271166
@jjspace I'm kind of surprised that concat is faster than the other approaches. It's difficult to find "the truth" here (and quick websearches bring up things like https://dev.to/uilicious/javascript-array-push-is-945x-faster-than-array-concat-1oki 🥴 ). But I tested it on NodeJS, Firefox, and Chrome, and it seems to be consistent. (Not much in many cases, but consistent).
Somehow related to the concat-approach: When I listed the "details" bullet points in the last comment, I actually considered this:
- Should
addAllreturn the target array?
But I didn't propose this, because it would raise several tricky follow-up questions.
First of all: There certainly are several places where concat would be the easiest, cleanest, and (apparently) most efficient solution. And it could be worthwhile to examine the places that currently use push.apply in that regard. Many of them can probably be changed easily.
But as a general recommendation, the concern that I have with concat is simple: We don't have the slightest clue about "ownership"!
Imagine that there is some code like this (dummy/pseudocode):
function processSomething(frameState) {
const targetArray = frameState.targetArray;
addAll(createSourcesA(), targetArray);
addAll(createSourcesB(), targetArray);
addAll(createSourcesC(), targetArray);
}
(where addAll is the function that is about to be introduced).
So far, so good. It adds a bunch of sources to the targetArray that is stored in the FrameState.
Now, concat has a drastically different behavior. It returns a new array. So this will have to be...
function processSomething(frameState) {
const targetArray = frameState.targetArray;
let newTargetArray = targetArray;
newTargetArray = newTargetArray.concat(createSourcesA());
newTargetArray = newTargetArray.concat(createSourcesB());
newTargetArray = newTargetArray.concat(createSourcesC());
// THIS IS IMPORTANT:
frameState.targetArray = newTargetArray:
}
We can talk about details of the inner part. And great care is necessary to not accidentally omit that last line.
But even then: You simply never know whether at some point in time, some part of the code, for some reason did
this.stuffFromFrameState = frameState.targetArray;
And this array will then never be updated. (It is actually possible to write code in a way where this is not a concern. But that code is usually not JavaScript ;-) ).
So my two cents:
- Yes, we can consider to simply use
concat, but have to be very careful that this doesn't break anything - For some existing patterns, I think that such an
addAllfunction would still be highly beneficial
@Tim-Quattrochi Sorry, this discussion may have detailed a bit 😁 What started as an "uncontroversial, pattern-based boilerplate code change" now touches some of the guts of the language and library. But I think that the potential improvement is worth these considerations.
My biggest concern about always using concat is that is always creates a new array instead of mutating the first one in place. This can cause memory pressure and performance issues, so we should be intentional about when we create new objects or not.
It's not entirely clear how to proceed here. This is a PR that affects "many" files, and there has been a linting update in the meanime, which affects ... "all files", basically - meaning that there are a bunch of conflicts. There are instructions for updating such a diverged branch. But I think that it might be easier to...
- Create that
addAllfunction (similar todefined.js) - Start from the current
mainbranch state (which already has all the linting changes) - Mechanically change the
target.push.apply(target, source);calls toaddAll(target, source);- (This will require less thought than the previous change: All the subtle questions about "storing the
length" or "not repeatedly callinggetSources()do not apply here - and that's exactly why I think that this would be a good thing to do!)
- (This will require less thought than the previous change: All the subtle questions about "storing the
- Create a new PR for that
But that's only a gut feeling right now. @Tim-Quattrochi if you instead want to update the branch with the linter changes and extend this PR, that should also be possible.
if you instead want to update the branch with the linter changes and extend this PR
Can you elaborate on what you mean by linter changes?
CesiumJS uses prettier for automated code formatting. Before committing something, running npm run prettier cleans up the code formatting, removes unnecessary spaces, fixes the indentation etc. But recently, the version of 'prettier' that is used in CesiumJS has been updated (via #12206). One side effect of this change is that "all files are modified". This can lead to conflicts (as indicated by the list of "Conflicting files" currently shown below this PR).
Now, there are two options:
- You could go through the process described in https://github.com/CesiumGS/cesium/pull/12206 to update this branch and resolve the conflicts
- As described above: You could address the issue on a new branch (branched off from the current
main), and create a new PR for that
Option 1. requires a bit of twiddling with git, and may still require to resolve some conflicts manually. So I think that option 2. could be easier and make more sense: The suggestion to introduce a utility function for the specific functionality here (namely, "add all") is a good one, and allows for a much cleaner way of addressing the original issue.
Hi @Tim-Quattrochi and @javagl, what is remaining in this PR? Is it only to merge in main and account for prettier?
@ggetz The last thing here was that we considered adding an addAll function that provides an efficient and safe way of doing what was formerly done with push.apply, with the two options:
- doing this in this PR (changing all the existing changes)
- creating a fresh PR that dedicatedly introduces this
addAll, and updates all appearances ofpush.applymechanically to use that function
For 1, dealing with prettier and conflicts can be a pain in the back, so I think that 2. should be easier. The decision is left to @Tim-Quattrochi for now (but if Tim does not have the bandwidth to do either, then I'd put 2. on my TODO list - I think that the suggestion by Tim (to introduce such a function to foster DRY) is a good one, and shouldn't be sooo much effort),
Hi @Tim-Quattrochi and @javagl, what is remaining in this PR? Is it only to merge in main and account for prettier?
Hi sorry, I need to lint and possibly a helper. I put it on draft because I was dealing with hurricane Milton.
Hi sorry, I need to lint and possibly a helper. I put it on draft because I was dealing with hurricane Milton.
Understandable! Hope you are safe!
@javagl If you do have the bandwidth to help out here, that would be appreciated.
@Tim-Quattrochi We can set up a call whenever it suits you.
(EDIT: Unless we can sort out the questions via comments here, that's also fine...)
Closing in favor of https://github.com/CesiumGS/cesium/pull/12361