cesium icon indicating copy to clipboard operation
cesium copied to clipboard

Issue 12053- CesiumJS should not use push.apply

Open Tim-Quattrochi opened this issue 1 year ago • 17 comments

Description

Change push.apply to use for loops to avoid exceeding the maximum call stack.

Issue number and link

#12053

Testing plan

Author checklist

  • [x] I have submitted a Contributor License Agreement
  • [x] I have added my name to CONTRIBUTORS.md
  • [x] I have updated CHANGES.md with a short summary of my change
  • [x] I have added or updated unit tests to ensure consistent code coverage
  • [x] I have updated the inline documentation, and included code examples where relevant
  • [x] I have performed a self-review of my code

Tim-Quattrochi avatar Aug 31 '24 01:08 Tim-Quattrochi

Thank you for the pull request, @Tim-Quattrochi!

:white_check_mark: We can confirm we have a CLA on file for you.

github-actions[bot] avatar Aug 31 '24 01:08 github-actions[bot]

Thanks @Tim-Quattrochi!

@javagl Are you available to review?

ggetz avatar Sep 03 '24 13:09 ggetz

Not to distract from the main point here, but:

the pattern that is often used in the CesiumJS source code, for exactly this reason, namely, to even pull out the length.

I have actually been changing this pattern… because:

  • it is extra mental overhead when reading the code
  • The pattern for (let i = 0; i < array.length; i++) is so boilerplate that the .length accessor is already optimized out (stored in a register) by modern JIT compilers (at least on every recent benchmark I have seen).

Having said that, I agree that exampleObject.getSources() should be precomputed. The compiler will not know what to do with custom getters.

jjhembd avatar Sep 06 '24 00:09 jjhembd

@jjhembd I agree that not storing the array.length can increase readability (and avoid certain bugs). The performance was once said to be the reason for pulling it out.

Reliable benchmarks can be difficult, and the JIT is just one very large black box. E.g. does it only store that length in a register when it is actually an array, or also when length is a custom property? Or when it is a 'getter' for a custom property that was determined to not be affected by the loop...? In any case, it's a pity that the Iterable construct does not seem to be JITted. The results in https://github.com/CesiumGS/cesium/pull/11697#issuecomment-1889968947 have been pretty disappointing in that regard...


EDIT: I already mentioned in another comment that we might consider updating the coding guide to no longer recommend the const n = array.length pattern

javagl avatar Sep 06 '24 11:09 javagl

@jjhembd I agree that not storing the array.length can increase readability (and avoid certain bugs). The performance was once said to be the reason for pulling it out.

Reliable benchmarks can be difficult, and the JIT is just one very large black box. E.g. does it only store that length in a register when it is actually an array, or also when length is a custom property? Or when it is a 'getter' for a custom property that was determined to not be affected by the loop...? In any case, it's a pity that the Iterable construct does not seem to be JITted. The results in #11697 (comment) have been pretty disappointing in that regard...

EDIT: I already mentioned in another comment that we might consider updating the coding guide to no longer recommend the const n = array.length pattern

@javagl If we were to keep the current pattern, how would you ensure it stays DRY? My thought was to create a utility function or perhaps a method, but I wanted to get your opinion. Repeating the length variable and source for each conditional seems like an anti-pattern to me.

Tim-Quattrochi avatar Sep 26 '24 18:09 Tim-Quattrochi

From quickly scrolling over the changes, it seems like the inlined comments have been addressed. I'll do a (probably final) review pass early tomorrow (and just for completeness, try out the "artificial" model that I created to provoke the error).

About storing the length: (I'm not even sure whether that question was addressed at me): I don't have a strong opinion there. The main point was about storing things like the getSources() (in the example). And I mentioned the pattern of storing the length as const n = something.length; mainly because that was used in most of the codebase, and even in snippets in the coding guide. But using the something.length directly in the loop is also fine for me.

@jjhembd Do you think that new/modified code should consistently use the pattern without storing the length?

The strong(est) case that I could make against storing the length is when there are "many" arrays and loops in one method, and one has to distinguish the names (i.e. when it cannot be n, but has to be array0Length.. arrayNLength). Is this what your comment about DRY referred to? I'm not sure how a utility function/method could help here...

It would be great if these considerations could be obsolete: I think that the for (const element of array) should be the idiomatic code. (You actually used this in one place - not sure whether that was intentional). Unfortunately, JavaScript compilers/JITs don't seem to be very smart, and these loops seem to be much slower, even though they could trivially be translated into the index-based ones by the compiler or JIT...

javagl avatar Sep 26 '24 21:09 javagl

About storing the length: (I'm not even sure whether that question was addressed at me): I don't have a strong opinion there. The main point was about storing things like the getSources() (in the example). And I mentioned the pattern of storing the length as const n = something.length; mainly because that was used in most of the codebase, and even in snippets in the coding guide. But using the something.length directly in the loop is also fine for me.

@javagl Yes it was, sorry I '@' the wrong person. I was just referring to the fact that I have to repeat myself a lot. It was a general best practice question, not necessarily something I am introducing. For example, I have to create a variable to point at the source, create a target, create a variable to hold the length of the source. This can happen a lot in a method that has more conditions. An example is here: packages/engine/Source/Scene/Cesium3DTileBatchTable.js lines 375, 376, then again 382, 387. Would it be good practice to create a function to:

  • create a reference to the source.
  • create a variable for the length of the source.
  • create a target.
  • loop through and push.
  • return the result.

Again, this is just a general question and I am not suggesting Cesium do this.

It would be great if these considerations could be obsolete: I think that the for (const element of array) should be the idiomatic code. (You actually used this in one place

I don't believe I refactored those after changes requested. I can have them changed and pushed to this PR before you do a review, and yes I will absolutely try out the artificial model that you created.

Tim-Quattrochi avatar Sep 26 '24 22:09 Tim-Quattrochi

Would it be good practice to create a function to: ...

I misunderstood that first. I thought that you referred to a function that can "always" replace such a for-loop, in the most generic form. But now I see that you only referred to the push.apply case here.

And indeed, such a function could make sense, and it could be pretty straightforward, like

function pushAll(source, target) {
    for (let i=0; i<source.length; i++){
        target.push(source[i]);
    }
}

This would turn a code block like the one that you linked to from this

    const hierarchyPropertyIdsLength = hierarchyPropertyIds.length;
    for (let i = 0; i < hierarchyPropertyIdsLength; ++i) {
      results.push(hierarchyPropertyIds[i]);
    }

into this

pushAll(hierarchyPropertyIds, results);

One could even pre-allocate the space in the target array, to avoid repeated allocations/resize operations.


Now... I'm a bit torn:

  1. On the one hand, I don't want to hold up your PR, or let it become "obsolete" by suggesting to introduce such a function
  2. On the other hand, I'd be strongly in favor of introducing such a function
  3. (On the third hand, not being part of the code CesiumJS squad, I probably cannot make the final decision here)

But even at the risk of making most of the changes here obsolete, I'd vote for introducing such a function, because it could tremendously increase readability and maintanability (and performance), with little effort and bascially zero risk.

I just threw together some ""benchmark"", to compare a few things here. (I'm not a "JavaScript performance testing expert", so this has to be taken with a huge grain of salt...).

The benchmark creates arrays of different sizes, repeatedly puts the contents of these source arrays into target arrays, and prints the timing information. The approaches that are compared here are

  • The push.apply (causing errors for larger arrays)
  • The ...spread operator (also causing errors)
  • An index-based for loop (the approach that is now used in this PR)
  • A foreach loop (just for comparison)
  • An index-based for loop with preallocated target array size, as it could be done easily and transparently(!) in such a function

(Some results are given below)

The code:

function runSingleTest(sourceSize, testFunction) {
  const sourceArray = Array(sourceSize);
  const targetArray = [];
  const startMs = performance.now();
  try {
    testFunction(sourceArray, targetArray);
  } catch (errror) {
    return undefined;
  }
  const endMs = performance.now();
  const durationMs = endMs - startMs;
  return durationMs;
}

function runTest(runs, sourceSize, name, testFunction) {
  let sumMs = 0.0;
  for (r = 0; r < runs; r++) {
    const ms = runSingleTest(sourceSize, testFunction);
    if (ms === undefined) {
      console.log(`sourceSize ${sourceSize}, name ${name}, ERROR`);
      return;
    }
    sumMs += ms;
  }
  const averageMs = sumMs / runs;
  console.log(`sourceSize ${sourceSize}, name ${name}, averageMs ${averageMs}`);
}

function usePushApply(sourceArray, targetArray) {
  targetArray.push.apply(targetArray, sourceArray);
}

function useSpread(sourceArray, targetArray) {
  targetArray.push(...sourceArray);
}

function useForEachLoop(sourceArray, targetArray) {
  for (const element of sourceArray) {
    targetArray.push(element);
  }
}

function useForLoop(sourceArray, targetArray) {
  for (let i = 0; i < sourceArray.length; i++) {
    targetArray.push(sourceArray[i]);
  }
}

function useForLoopPreallocated(sourceArray, targetArray) {
  const t = targetArray.length;
  targetArray.length += sourceArray.length;
  for (let i = 0; i < sourceArray.length; i++) {
    targetArray[t + i] = sourceArray[i];
  }
}

function test() {
  const runs = 100;
  for (let n = 100; n <= 1000000; n *= 10) {
    runTest(runs, n, "pushApply", usePushApply);
    runTest(runs, n, "spread", useSpread);
    runTest(runs, n, "forEach", useForEachLoop);
    runTest(runs, n, "for", useForLoop);
    runTest(runs, n, "forAlloc", useForLoopPreallocated);
  }
}

try {
  test();
} catch (error) {
  console.log(error);
}

Some results (a summary will be given below):


On NodeJS:

sourceSize 100, name pushApply, averageMs 0.0016129999561235308
sourceSize 100, name spread, averageMs 0.006134000001475215
sourceSize 100, name forEach, averageMs 0.018598999972455205
sourceSize 100, name for, averageMs 0.007068000007420778
sourceSize 100, name forAlloc, averageMs 0.0077609999664127825
sourceSize 1000, name pushApply, averageMs 0.005562999998219311
sourceSize 1000, name spread, averageMs 0.003792000007815659
sourceSize 1000, name forEach, averageMs 0.0369120000069961
sourceSize 1000, name for, averageMs 0.02806300004012883
sourceSize 1000, name forAlloc, averageMs 0.0023279999662190677
sourceSize 10000, name pushApply, averageMs 0.019967000023461877
sourceSize 10000, name spread, averageMs 0.019638000009581446
sourceSize 10000, name forEach, averageMs 0.038648000014945866
sourceSize 10000, name for, averageMs 0.038686000015586613
sourceSize 10000, name forAlloc, averageMs 0.012039000024087728
sourceSize 100000, name pushApply, averageMs 1.8059109999565408
sourceSize 100000, name spread, averageMs 1.744411000018008
sourceSize 100000, name forEach, averageMs 1.2372270000027492
sourceSize 100000, name for, averageMs 1.229792999974452
sourceSize 100000, name forAlloc, averageMs 0.39260800000280144
sourceSize 1000000, name pushApply, ERROR
sourceSize 1000000, name spread, ERROR
sourceSize 1000000, name forEach, averageMs 21.47643099999521
sourceSize 1000000, name for, averageMs 21.594332000017165
sourceSize 1000000, name forAlloc, averageMs 4.8390970000252125

On Chrome in jsfiddle

"sourceSize 100, name pushApply, averageMs 0"
"sourceSize 100, name spread, averageMs 0"
"sourceSize 100, name forEach, averageMs 0.006999999997206032"
"sourceSize 100, name for, averageMs 0.003000000002793968"
"sourceSize 100, name forAlloc, averageMs 0.00400000000372529"
"sourceSize 1000, name pushApply, averageMs 0.002999999998137355"
"sourceSize 1000, name spread, averageMs 0.003999999999068677"
"sourceSize 1000, name forEach, averageMs 0.01600000000093132"
"sourceSize 1000, name for, averageMs 0.00400000000372529"
"sourceSize 1000, name forAlloc, averageMs 0.003999999999068677"
"sourceSize 10000, name pushApply, averageMs 0.01899999999906868"
"sourceSize 10000, name spread, averageMs 0.017999999998137353"
"sourceSize 10000, name forEach, averageMs 0.04199999999720603"
"sourceSize 10000, name for, averageMs 0.03300000000279397"
"sourceSize 10000, name forAlloc, averageMs 0.011999999997206032"
"sourceSize 100000, name pushApply, averageMs 1.2229999999841674"
"sourceSize 100000, name spread, averageMs 1.1540000000130386"
"sourceSize 100000, name forEach, averageMs 0.6920000000204891"
"sourceSize 100000, name for, averageMs 0.741999999973923"
"sourceSize 100000, name forAlloc, averageMs 0.1800000000046566"
"sourceSize 1000000, name pushApply, ERROR"
"sourceSize 1000000, name spread, ERROR"
"sourceSize 1000000, name forEach, averageMs 12.13999999998603"
"sourceSize 1000000, name for, averageMs 12.027999999970199"
"sourceSize 1000000, name forAlloc, averageMs 2.443000000016764"

On Firefox in jsfiddle:

"sourceSize 100, name pushApply, averageMs 0.01"
"sourceSize 100, name spread, averageMs 0.01"
"sourceSize 100, name forEach, averageMs 0"
"sourceSize 100, name for, averageMs 0"
"sourceSize 100, name forAlloc, averageMs 0.01"
"sourceSize 1000, name pushApply, averageMs 0.06"
"sourceSize 1000, name spread, averageMs 0.03"
"sourceSize 1000, name forEach, averageMs 0"
"sourceSize 1000, name for, averageMs 0"
"sourceSize 1000, name forAlloc, averageMs 0"
"sourceSize 10000, name pushApply, averageMs 0.36"
"sourceSize 10000, name spread, averageMs 0.33"
"sourceSize 10000, name forEach, averageMs 0.08"
"sourceSize 10000, name for, averageMs 0.05"
"sourceSize 10000, name forAlloc, averageMs 0.04"
"sourceSize 100000, name pushApply, averageMs 3.85"
"sourceSize 100000, name spread, averageMs 3.62"
"sourceSize 100000, name forEach, averageMs 0.86"
"sourceSize 100000, name for, averageMs 0.38"
"sourceSize 100000, name forAlloc, averageMs 0.34"
"sourceSize 1000000, name pushApply, ERROR"
"sourceSize 1000000, name spread, ERROR"
"sourceSize 1000000, name forEach, averageMs 9.28"
"sourceSize 1000000, name for, averageMs 5.04"
"sourceSize 1000000, name forAlloc, averageMs 5.58"

For small arrays, it does not seem to matter in terms of performance.

For arrays of size >~150000, the push.apply and ...spread cause errors, so they are out of the game anyhow. But even in the cases where they still work, they are noticably slower than the alternatives.

For large arrays, the index-based for and foreach seem to have roughly the same performance in NodeJS and Chrome. In FireFox, the foreach one is noticably slower.

The "index-based-for-with-preallocated-size" seems to consistently be the fastest (the 5.58 vs. 5.04 on FireFox is "noise"). In fact, for larger arrays on NodeJS and Chrome, this one seems to be 3-5 times faster than the standard for/push approach.


tl;dr (too late for that, I know 🙂 )

I'd second your suggestion to introduce a function like this

function pushAll(source, target) {
    const t = target.length;
    target.length = t + source.length;
    for (let i=0; i<source.length; i++){
        target[t + i] = source[i];
    }
}

That can be used in all places where the contents of one array has to be added to another (i.e. at least all places that are currently affected by this PR - and probably some more, where this already was done with a manual, inlined for loop)

To be confirmed by the CesiumJS core squad.

javagl avatar Sep 27 '24 12:09 javagl

  1. On the one hand, I don't want to hold up your PR

I could have emailed you about it, I was genuinely curious about your opinion on that. Not only for this project, but for my own development structure. I appreciate your thorough responses.

Tim-Quattrochi avatar Sep 27 '24 12:09 Tim-Quattrochi

As long as there's not an existing method which does the equivalent (it sounds like we want the concat method, but without creating a new array each time), I see no harm in adding a helper function to the general API.

ggetz avatar Sep 27 '24 12:09 ggetz

I always feel like a complete noob when I have to do searches that lead to things like https://stackoverflow.com/questions/4156101/copy-array-items-into-another-array . But ... inside that usual stack overflow noise, there are some hints that seem to confirm that push.apply/...spread are simple but problematic, concat is simple and fast (but not what we want here), and a custom utility function that preallocates the output seems to be the most versatile for what we want to do here.

So unless someone disagrees, there will probably be some function for this. This would reside on the same level as defined and defaultValue. And... similar to the latter, we can hope that it can one day be replaced by one efficient, idiomatic, built-in functionality.

We could now talk about details.... consider this as brainstorming:

  • It could be called pushAll, but... depending on the implementation, it's not "pushing", so I'd call it addAll
  • It could transparently handle the case of source===undefined by doing nothing (?)
  • ...

javagl avatar Sep 27 '24 13:09 javagl

I've only been half following this thread so if I missed the answer already then ignore me.

concat is simple and fast (but not what we want here)

Is there a reason we can't replace push.apply with concat and just reassign variables? I wouldn't expect anything this was previously used on to be anything but a pure Array so I wouldn't expect references to be broken or anything.

arr1.push.apply(arr1, arr2);
// could just be
arr1 = arr1.concat(arr2);

This probably wouldn't work in a helper function but I don't think it would even need the helper function if we went this route. With a very naive test this is faster than even the pre-allocated solution above, it just requires a slight rework to how we're using these variables and maybe converting some from const to let

sourceSize 100, name forAlloc, averageMs 0.006000000238418579
sourceSize 100, name concat, averageMs 0.0029999995231628418
sourceSize 1000, name forAlloc, averageMs 0.006000000834465027
sourceSize 1000, name concat, averageMs 0.0010000002384185792
sourceSize 10000, name forAlloc, averageMs 0.02000000059604645
sourceSize 10000, name concat, averageMs 0.005
sourceSize 100000, name forAlloc, averageMs 0.31600000500679015
sourceSize 100000, name concat, averageMs 0.1809999990463257
sourceSize 1000000, name forAlloc, averageMs 4.498000001907348
sourceSize 1000000, name concat, averageMs 3.0629999989271166

jjspace avatar Sep 27 '24 14:09 jjspace

@jjspace I'm kind of surprised that concat is faster than the other approaches. It's difficult to find "the truth" here (and quick websearches bring up things like https://dev.to/uilicious/javascript-array-push-is-945x-faster-than-array-concat-1oki 🥴 ). But I tested it on NodeJS, Firefox, and Chrome, and it seems to be consistent. (Not much in many cases, but consistent).

Somehow related to the concat-approach: When I listed the "details" bullet points in the last comment, I actually considered this:

  • Should addAll return the target array?

But I didn't propose this, because it would raise several tricky follow-up questions.


First of all: There certainly are several places where concat would be the easiest, cleanest, and (apparently) most efficient solution. And it could be worthwhile to examine the places that currently use push.apply in that regard. Many of them can probably be changed easily.

But as a general recommendation, the concern that I have with concat is simple: We don't have the slightest clue about "ownership"!

Imagine that there is some code like this (dummy/pseudocode):

function processSomething(frameState) {
    const targetArray = frameState.targetArray;
    
    addAll(createSourcesA(), targetArray);
    addAll(createSourcesB(), targetArray);
    addAll(createSourcesC(), targetArray);
}

(where addAll is the function that is about to be introduced).

So far, so good. It adds a bunch of sources to the targetArray that is stored in the FrameState.

Now, concat has a drastically different behavior. It returns a new array. So this will have to be...

function processSomething(frameState) {
    const targetArray = frameState.targetArray;
    
    let newTargetArray = targetArray;
    newTargetArray = newTargetArray.concat(createSourcesA());
    newTargetArray = newTargetArray.concat(createSourcesB());
    newTargetArray = newTargetArray.concat(createSourcesC());

    // THIS IS IMPORTANT:
    frameState.targetArray = newTargetArray:
}

We can talk about details of the inner part. And great care is necessary to not accidentally omit that last line.

But even then: You simply never know whether at some point in time, some part of the code, for some reason did this.stuffFromFrameState = frameState.targetArray;

And this array will then never be updated. (It is actually possible to write code in a way where this is not a concern. But that code is usually not JavaScript ;-) ).


So my two cents:

  • Yes, we can consider to simply use concat, but have to be very careful that this doesn't break anything
  • For some existing patterns, I think that such an addAll function would still be highly beneficial

@Tim-Quattrochi Sorry, this discussion may have detailed a bit 😁 What started as an "uncontroversial, pattern-based boilerplate code change" now touches some of the guts of the language and library. But I think that the potential improvement is worth these considerations.

javagl avatar Sep 27 '24 18:09 javagl

My biggest concern about always using concat is that is always creates a new array instead of mutating the first one in place. This can cause memory pressure and performance issues, so we should be intentional about when we create new objects or not.

ggetz avatar Sep 27 '24 18:09 ggetz

It's not entirely clear how to proceed here. This is a PR that affects "many" files, and there has been a linting update in the meanime, which affects ... "all files", basically - meaning that there are a bunch of conflicts. There are instructions for updating such a diverged branch. But I think that it might be easier to...

  • Create that addAll function (similar to defined.js)
  • Start from the current main branch state (which already has all the linting changes)
  • Mechanically change the target.push.apply(target, source); calls to addAll(target, source);
    • (This will require less thought than the previous change: All the subtle questions about "storing the length" or "not repeatedly calling getSources() do not apply here - and that's exactly why I think that this would be a good thing to do!)
  • Create a new PR for that

But that's only a gut feeling right now. @Tim-Quattrochi if you instead want to update the branch with the linter changes and extend this PR, that should also be possible.

javagl avatar Oct 01 '24 14:10 javagl

if you instead want to update the branch with the linter changes and extend this PR

Can you elaborate on what you mean by linter changes?

Tim-Quattrochi avatar Oct 07 '24 23:10 Tim-Quattrochi

CesiumJS uses prettier for automated code formatting. Before committing something, running npm run prettier cleans up the code formatting, removes unnecessary spaces, fixes the indentation etc. But recently, the version of 'prettier' that is used in CesiumJS has been updated (via #12206). One side effect of this change is that "all files are modified". This can lead to conflicts (as indicated by the list of "Conflicting files" currently shown below this PR).

Now, there are two options:

  1. You could go through the process described in https://github.com/CesiumGS/cesium/pull/12206 to update this branch and resolve the conflicts
  2. As described above: You could address the issue on a new branch (branched off from the current main), and create a new PR for that

Option 1. requires a bit of twiddling with git, and may still require to resolve some conflicts manually. So I think that option 2. could be easier and make more sense: The suggestion to introduce a utility function for the specific functionality here (namely, "add all") is a good one, and allows for a much cleaner way of addressing the original issue.

javagl avatar Oct 08 '24 13:10 javagl

Hi @Tim-Quattrochi and @javagl, what is remaining in this PR? Is it only to merge in main and account for prettier?

ggetz avatar Oct 29 '24 13:10 ggetz

@ggetz The last thing here was that we considered adding an addAll function that provides an efficient and safe way of doing what was formerly done with push.apply, with the two options:

  1. doing this in this PR (changing all the existing changes)
  2. creating a fresh PR that dedicatedly introduces this addAll, and updates all appearances of push.apply mechanically to use that function

For 1, dealing with prettier and conflicts can be a pain in the back, so I think that 2. should be easier. The decision is left to @Tim-Quattrochi for now (but if Tim does not have the bandwidth to do either, then I'd put 2. on my TODO list - I think that the suggestion by Tim (to introduce such a function to foster DRY) is a good one, and shouldn't be sooo much effort),

javagl avatar Oct 29 '24 15:10 javagl

Hi @Tim-Quattrochi and @javagl, what is remaining in this PR? Is it only to merge in main and account for prettier?

Hi sorry, I need to lint and possibly a helper. I put it on draft because I was dealing with hurricane Milton.

Tim-Quattrochi avatar Oct 29 '24 15:10 Tim-Quattrochi

Hi sorry, I need to lint and possibly a helper. I put it on draft because I was dealing with hurricane Milton.

Understandable! Hope you are safe!

@javagl If you do have the bandwidth to help out here, that would be appreciated.

ggetz avatar Oct 29 '24 17:10 ggetz

@Tim-Quattrochi We can set up a call whenever it suits you.

(EDIT: Unless we can sort out the questions via comments here, that's also fine...)

javagl avatar Oct 30 '24 15:10 javagl

Closing in favor of https://github.com/CesiumGS/cesium/pull/12361

jjspace avatar Jul 29 '25 15:07 jjspace