js-framework-benchmark Make vanillajs score fastest on chrome 124 (again)

With chrome 124 the vanillajs and vanillajs-1 implementation don't score fastest. Currently it's not clear whether actually some change favors other frameworks or it's a bug with chrome 124 traces. Here's the current result: I'd believe chrome 124 results if we could create a new vanillajs implementation that becomes the fastest implementation again. The first goal would be an implementation that runs fastest in chrome 124 for create 1,000 rows, replace all rows and create 10,000 rows. It would be really appreciated to get vanillajs PRs. I promise to run them in a timely fashion. Thanks.

May 02 '24 17:05 krausest

#1633 Here is written about an assumption to increase the speed of work. I think that after the vanilla-2 js version I will need to close this issue.

May 02 '24 18:05 antonmak1

I still don't understand the meaning of e.preventDefault() for each button. It seems to me that stopPropagation() will be much more effective in this sense (At least I checked it, so I think so).

May 02 '24 18:05 antonmak1

Thanks. I think I applied the suggestions for vanillajs-2:

Spaces in the tr template such that createRow can change nodeValue instead of textContent
Clear table by cloning the tbody
(not relevant for the benchmarks below): Use nodeValue instead of data when updating a row

Ivi is still faster for create 1k rows, it's a tie for replace rows and vanillajs-2 is faster for create 10k rows.

May 03 '24 16:05 krausest

the click event listener now checks directly the id
clear row was switched back to tbody.textContent = "" Results:

May 03 '24 18:05 krausest

You can try some more things:

Inline appendRows function. The reason behind this is that ivi is doing a lot of inlining so theoretically it could get some optimization from that.
Append all rows in one batch by building the tbody beforehand (don't blame me, the jar was already open 😅)

Everything put together:

// top level inline function
const appendInline = (function () {
    var rows = this.rows, s_data = this.store.data, data = this.data, tbody = this.tbody;
    const empty = !tbody.firstChild;
    empty && tbody.remove();
    for(let i=rows.length;i<s_data.length; i++) {
        let tr = this.createRow(s_data[i]);
        rows[i] = tr;
        data[i] = s_data[i];
        tbody.appendChild(tr);
    }
    empty && this.table.insertBefore(tbody, null);
});

    appendRows() {
        appendInline.call(this);
    }

May 03 '24 19:05 syduki

Here's vanillajs with appendInline and the other optimizations applied: Screenshot 2024-05-03 at 22 25 03 Good improvement for create 10k rows. Sadly create 1k rows is stuck and ivi is still ahed.

May 03 '24 20:05 krausest

Sorry, I have had my hands full the whole time. My implementation in Eventiveness uses the StringBuilder pattern from Java to greatly improve the performance for adding rows, especially for the 10K rows. It's like this:

Compose an array containing the outerHTML of all the rows. I would have preferred a generator here if there was a standalone join function.
Join the array at once. This is faster than building up the string incrementally.
Create a fragment from the joined string
Append the fragment to the tbody.

It is hard to imagine another implementation should be faster than this. Most of the heavy lifting is left to the engine. I will add that faster for 10K rows is always better than for 1k rows. It suggests better scalability.

May 04 '24 05:05 mksunny1

it is less performant to store the element template in a 'global' string than just inlining it directly into the 'for' loop used to generate the item markups. I reworked my 'framework' to perform similarly to the inline case.

May 04 '24 05:05 mksunny1

Also please don't use multiple calls to push when composing the array. Use a generator and destructure into an array with [...myGenerator]. I forgot this detail earlier.

May 04 '24 05:05 mksunny1

Another difference that comes to my mind is that ivi uses direct calls to DOM methods to avoid megamorphic callsites [1][2] , but vanillajs implementation doesn't have megamorphic callsites, so switching to direct calls should make vanilla slower.

Also, ivi doesn't use any type of event delegation and there are 2 addEventListener() calls per row.

https://github.com/localvoid/ivi/blob/bd5bbe8c6b39a7be1051c16ea0a07b3df9a178bd/packages/ivi/src/client/core.ts#L41-L80
https://github.com/localvoid/ivi/blob/bd5bbe8c6b39a7be1051c16ea0a07b3df9a178bd/packages/ivi/src/client/core.ts#L589-L593

May 04 '24 07:05 localvoid

Thanks so far. I think I can show my concerns with chrome 124 in the following two charts: This is the duration for rendering. Chrome 124 reports about 2.5 msecs faster rendering duration (and this can indeed be validated in the traces). That wasn't the case with chrome 123: Why is chrome 124 rendering for ivi faster than the others? (Or as for my concerns: Why is chrome reporting faster rendering durations for ivi? I'm still not convinced that those are real.)

The difference in script duration isn't what makes ivi fastest:

May 04 '24 08:05 krausest

One example trace for create 1k rows for ivi: Screenshot 2024-05-04 at 10 28 13 Total duration: 34.91 msecs Script duraton: 3.18 msecs Rendering: 30 msesc (9.71 msecs recalculate style, 16.87 msecs layout, 3.22 msecs Pre-Paint, 1.11 msecs Paint)

One example trace for create 1k for vanillajs: Screenshot 2024-05-04 at 10 28 36 Total duration: 38.29 msecs Script duraton: 3.08 msecs Rendering: 34 msesc (10.03 msecs recalculate style, 19.66 msecs layout, 3.45 msecs Pre-Paint, 1.04 msecs Paint)

@localvoid Any ideas why chrome would perform layout faster for ivi?

May 04 '24 08:05 krausest

Trying to find any differences in the DOM output. Maybe because <a> elements has classes lbl and remove, but there aren't any css rules for this classes, so it is highly unlikely. https://github.com/krausest/js-framework-benchmark/blob/f3e82b3ed4cb47edfab85d072adc81f274318239/frameworks/keyed/vanillajs-2/src/Main.js#L9

May 04 '24 08:05 localvoid

@localvoid Good catch, but it's not causing that effect. This is create 1k rows for ivi with the lbl and remove classes (I'll commit that too) and vanillajs without: Overall a bad run, but ivi still renders 2.5 msecs faster (says chrome 124)

May 04 '24 17:05 krausest

I'll commit that too

I think that this classes (lbl, remove) are used in vanilla because of event delegation. Originally, all implementations like React, etc didn't have this classes: https://github.com/krausest/js-framework-benchmark/blob/970b34f7704b6dbd496533fa25314c694cb3d05f/frameworks/keyed/react-classes/src/main.jsx#L57-L68

May 04 '24 23:05 localvoid

@krausest The bubbling process of selectRow and deleteRow can be stopped on tr by stopPropagation. That is, do not do stopPropagation immediately on a click, but do it only after reaching the 'parentNode of the function' on tr. I don’t know how to do it in vanilla yet. This will speed up the work.

May 05 '24 07:05 antonmak1

@krausest I don't know, you can just take tr and set the cancelBubble property to true. But this is not done this way - you need to use stopPropagation somehow.

May 05 '24 07:05 antonmak1

@localvoid You're right. I reverted ivi and removed the classes from the vanillajs* implementations. The result (total duration) still looks like that:

Now what's interesting is the results for create 1k rows without warmup (i.e. initBenchmark just checking the existence of #run, no create / clear cycle): And even more interesting is the rendering duration without warmup: A flatline, which really makes sense (here...).

May 05 '24 17:05 krausest

@krausest I like this last chart. I think you are looking for a more consistent measurement scheme, that can weed out the noise and measure the most important things about how the frameworks build DOMs. Will a vanillajs implementation that scores faster now still suffice for the long run?

May 06 '24 05:05 mksunny1

It could be worth it to make the measurements more fine-grained here. Some aspects may then be more susceptible to browser implementation changes than others. Many will remain consistent, especially over a short period.

May 06 '24 05:05 mksunny1

@krausest the delta in the number of nodes in the trace you showed above don't seem to match. Vanilla is 487 starting and 11488 total (11001 new nodes) vs ivi with 40102 starting and 50103 total (10001 new nodes). That's 1000 less new nodes for ivi, there's also the issue of why ivi was starting with so many nodes. Maybe GC didn't collect the nodes from the warm-up, not sure if that would have an impact, but it might.

May 06 '24 05:05 robbiespeed

@robbiespeed good catch, indeed ivi stands out that the dom nodes created by initBenchmark are still reported by chrome (in contrast to vanillajs, doohtml and solid). @localvoid is that intentional?

May 06 '24 19:05 krausest

May have figured out the reason for the different node deltas. There's a bit of a hint if you look at the graph somewhere during layout for VanillaJS chrome will create 1k new nodes for the ::before pseudo element of each row for the remove glyph (removing the glyph classes will make the node count a consistent 10 nodes per row). My guess chrome skips creating new nodes for the pseudo elements if there were old ones still around to "recycle".

It would be interesting to see how the benchmark fairs when removing the css that creates the pseudo elements.

May 06 '24 21:05 robbiespeed

@krausest No, I don't have a clue why this is happening. ivi doesn't have any type of DOM or js object recycling.

May 07 '24 02:05 localvoid

Ran vanillajs against ivi with and without the glyphs. Removing the glyphs didn't help close the perf gap.

May 07 '24 19:05 robbiespeed

@robbiespeed Maybe we're really on to something Seems like forceGC() which called window.gc() a few times didn't clean the dom nodes for ivi. If we run window.gc() and HeapProfiler.collectGarbage" the node count goes down: Screenshot 2024-05-07 at 20 58 40

And even more, the performance is closer to what we'd expect: And performance doesn't change wildly when a 500 msec delay is added:

Now I'll run the benchmarks above in a rather clean run with the modified forceGC call. I'll report back.

May 07 '24 19:05 krausest

Here's the result with the modified forceGC method:

Looks much more reasonable to me:

Adding a delay doesn't yield a different order
Much closer to manual testing

May 07 '24 20:05 krausest

With 1.000 msecs delay the graph looks pretty similar, which is really good news!

May 07 '24 20:05 krausest

Conclusion:

Somehow calling 7x window.gc() didn't clean up the dom nodes for ivi.
Running window.gc() and HeapProfiler.collectGarbage in a loop with seven iterations clears the dom nodes for ivi
With the nodes gc'ed ivi performs similar to chrome 123 and similar to manual tracing (Which leaves one question: Why is ivi much faster when the nodes are no gc'ed? Is there some kind of caching in chrome 124?)
This means I'm pretty confident we can get reasonable numbers for chrome 124 for all create actions. (There's might be still the remove row issue which starting with chrome 124 gives frameworks with RAF an advantage, but I'll save that for the next days...)

@paulirish Whenever it comes to GC we need your help. Running windows.gc() in a loop was your recommendation the last time. Is this still what you'd recommend? Any idea why HeapProfiler.collectGarbage collects more nodes for ivi than windows.gc() alone?

Update: Just checked two things:

Indeed with chrome 123 window.gc cleared the dom nodes for ivi
Increasing the loop count doesn't help when running only window.gc() for chrome 124. HeapProfiler.collectGarbage is needed to gc the nodes for ivi.

Chrome 123 with ivi and 7x window.gc. Node count drops after the gc calls from ~30k to 66. The next create rows call adds ~10k nodes. Screenshot 2024-05-07 at 23 10 43

Chrome 124 with ivi and 14x window.gc. Node count stays at 30k nodes and rises to 40k rows when create rows is performed: Screenshot 2024-05-07 at 23 13 03

Chrome 124 with 7x window.gc and HeapProfiler.collectGarbage. Node count starts with 66 (!) and rises to 10k rows: Screenshot 2024-05-07 at 23 16 00 What's still odd here is that the trace starts with 66 nodes and not with something like 30k to 50k nodes from initBenchmark. I checked all 15 traces for one benchmark run and every trace started with 66 nodes 🤯 . I don't understand that, but 10k at the end makes sense.

May 07 '24 21:05 krausest

@krausest try window.gc({type:'major',execution:'sync',flavor:'last-resort'}).

This is what the results I get when doing 7 gc with default params ({type:'major',execution:'sync'}): Screenshot from 2024-05-07 22-21-54

vs {type:'major',execution:'sync',flavor:'last-resort'}: Screenshot from 2024-05-07 22-19-38 notice vanilla and ivi swapped position and colour

I'm not entirely sure of the differences of last-resort GC, but it seems to be a more thorough GC pass meant for when the process has reached it's memory limits. I suspect HeapProfiler.collectGarbage is doing a last-resort GC internally.

gc options are documented in v8 source

May 08 '24 01:05 robbiespeed

js-framework-benchmark js-framework-benchmark copied to clipboard

Make vanillajs score fastest on chrome 124 (again)

js-framework-benchmark
js-framework-benchmark copied to clipboard