js-framework-benchmark
js-framework-benchmark copied to clipboard
Make vanillajs score fastest on chrome 124 (again)
With chrome 124 the vanillajs and vanillajs-1 implementation don't score fastest.
Currently it's not clear whether actually some change favors other frameworks or it's a bug with chrome 124 traces. Here's the current result:
I'd believe chrome 124 results if we could create a new vanillajs implementation that becomes the fastest implementation again.
The first goal would be an implementation that runs fastest in chrome 124 for create 1,000 rows, replace all rows and create 10,000 rows.
It would be really appreciated to get vanillajs PRs. I promise to run them in a timely fashion. Thanks.
#1633 Here is written about an assumption to increase the speed of work. I think that after the vanilla-2 js version I will need to close this issue.
I still don't understand the meaning of e.preventDefault()
for each button. It seems to me that stopPropagation()
will be much more effective in this sense (At least I checked it, so I think so).
Thanks. I think I applied the suggestions for vanillajs-2:
- Spaces in the tr template such that createRow can change nodeValue instead of textContent
- Clear table by cloning the tbody
- (not relevant for the benchmarks below): Use nodeValue instead of data when updating a row
Ivi is still faster for create 1k rows, it's a tie for replace rows and vanillajs-2 is faster for create 10k rows.
- the click event listener now checks directly the id
- clear row was switched back to tbody.textContent = ""
Results:
You can try some more things:
- Inline
appendRows
function. The reason behind this is thativi
is doing a lot of inlining so theoretically it could get some optimization from that. - Append all rows in one batch by building the
tbody
beforehand (don't blame me, the jar was already open 😅)
Everything put together:
// top level inline function
const appendInline = (function () {
var rows = this.rows, s_data = this.store.data, data = this.data, tbody = this.tbody;
const empty = !tbody.firstChild;
empty && tbody.remove();
for(let i=rows.length;i<s_data.length; i++) {
let tr = this.createRow(s_data[i]);
rows[i] = tr;
data[i] = s_data[i];
tbody.appendChild(tr);
}
empty && this.table.insertBefore(tbody, null);
});
appendRows() {
appendInline.call(this);
}
Here's vanillajs with appendInline and the other optimizations applied:
Good improvement for create 10k rows. Sadly create 1k rows is stuck and ivi is still ahed.
Sorry, I have had my hands full the whole time. My implementation in Eventiveness uses the StringBuilder pattern from Java to greatly improve the performance for adding rows, especially for the 10K rows. It's like this:
- Compose an array containing the outerHTML of all the rows. I would have preferred a generator here if there was a standalone
join
function. - Join the array at once. This is faster than building up the string incrementally.
- Create a fragment from the joined string
- Append the fragment to the tbody.
It is hard to imagine another implementation should be faster than this. Most of the heavy lifting is left to the engine. I will add that faster for 10K rows is always better than for 1k rows. It suggests better scalability.
it is less performant to store the element template in a 'global' string than just inlining it directly into the 'for' loop used to generate the item markups. I reworked my 'framework' to perform similarly to the inline case.
Also please don't use multiple calls to push
when composing the array. Use a generator and destructure into an array with [...myGenerator]. I forgot this detail earlier.
Another difference that comes to my mind is that ivi uses direct calls to DOM methods to avoid megamorphic callsites [1][2] , but vanillajs implementation doesn't have megamorphic callsites, so switching to direct calls should make vanilla slower.
Also, ivi doesn't use any type of event delegation and there are 2 addEventListener()
calls per row.
- https://github.com/localvoid/ivi/blob/bd5bbe8c6b39a7be1051c16ea0a07b3df9a178bd/packages/ivi/src/client/core.ts#L41-L80
- https://github.com/localvoid/ivi/blob/bd5bbe8c6b39a7be1051c16ea0a07b3df9a178bd/packages/ivi/src/client/core.ts#L589-L593
Thanks so far.
I think I can show my concerns with chrome 124 in the following two charts:
This is the duration for rendering. Chrome 124 reports about 2.5 msecs faster rendering duration (and this can indeed be validated in the traces).
That wasn't the case with chrome 123:
Why is chrome 124 rendering for ivi faster than the others? (Or as for my concerns: Why is chrome reporting faster rendering durations for ivi? I'm still not convinced that those are real.)
The difference in script duration isn't what makes ivi fastest:
One example trace for create 1k rows for ivi:
Total duration: 34.91 msecs
Script duraton: 3.18 msecs
Rendering: 30 msesc (9.71 msecs recalculate style, 16.87 msecs layout, 3.22 msecs Pre-Paint, 1.11 msecs Paint)
One example trace for create 1k for vanillajs:
Total duration: 38.29 msecs
Script duraton: 3.08 msecs
Rendering: 34 msesc (10.03 msecs recalculate style, 19.66 msecs layout, 3.45 msecs Pre-Paint, 1.04 msecs Paint)
@localvoid Any ideas why chrome would perform layout faster for ivi?
Trying to find any differences in the DOM output. Maybe because <a>
elements has classes lbl
and remove
, but there aren't any css rules for this classes, so it is highly unlikely. https://github.com/krausest/js-framework-benchmark/blob/f3e82b3ed4cb47edfab85d072adc81f274318239/frameworks/keyed/vanillajs-2/src/Main.js#L9
@localvoid Good catch, but it's not causing that effect.
This is create 1k rows for ivi with the lbl and remove classes (I'll commit that too) and vanillajs without:
Overall a bad run, but ivi still renders 2.5 msecs faster (says chrome 124)
I'll commit that too
I think that this classes (lbl
, remove
) are used in vanilla because of event delegation. Originally, all implementations like React, etc didn't have this classes: https://github.com/krausest/js-framework-benchmark/blob/970b34f7704b6dbd496533fa25314c694cb3d05f/frameworks/keyed/react-classes/src/main.jsx#L57-L68
@krausest The bubbling process of selectRow and deleteRow can be stopped on tr
by stopPropagation
. That is, do not do stopPropagation
immediately on a click
, but do it only after reaching the 'parentNode
of the function' on tr
. I don’t know how to do it in vanilla yet. This will speed up the work.
@krausest I don't know, you can just take tr
and set the cancelBubble
property to true
. But this is not done this way - you need to use stopPropagation
somehow.
@localvoid You're right. I reverted ivi and removed the classes from the vanillajs* implementations.
The result (total duration) still looks like that:
Now what's interesting is the results for create 1k rows without warmup (i.e. initBenchmark just checking the existence of #run, no create / clear cycle):
And even more interesting is the rendering duration without warmup:
A flatline, which really makes sense (here...).
@krausest I like this last chart. I think you are looking for a more consistent measurement scheme, that can weed out the noise and measure the most important things about how the frameworks build DOMs. Will a vanillajs implementation that scores faster now still suffice for the long run?
It could be worth it to make the measurements more fine-grained here. Some aspects may then be more susceptible to browser implementation changes than others. Many will remain consistent, especially over a short period.
@krausest the delta in the number of nodes in the trace you showed above don't seem to match. Vanilla is 487 starting and 11488 total (11001 new nodes) vs ivi with 40102 starting and 50103 total (10001 new nodes). That's 1000 less new nodes for ivi, there's also the issue of why ivi was starting with so many nodes. Maybe GC didn't collect the nodes from the warm-up, not sure if that would have an impact, but it might.
@robbiespeed good catch, indeed ivi stands out that the dom nodes created by initBenchmark are still reported by chrome (in contrast to vanillajs, doohtml and solid). @localvoid is that intentional?
May have figured out the reason for the different node deltas. There's a bit of a hint if you look at the graph somewhere during layout for VanillaJS chrome will create 1k new nodes for the ::before
pseudo element of each row for the remove glyph (removing the glyph classes will make the node count a consistent 10 nodes per row). My guess chrome skips creating new nodes for the pseudo elements if there were old ones still around to "recycle".
It would be interesting to see how the benchmark fairs when removing the css that creates the pseudo elements.
@krausest No, I don't have a clue why this is happening. ivi doesn't have any type of DOM or js object recycling.
Ran vanillajs against ivi with and without the glyphs. Removing the glyphs didn't help close the perf gap.
@robbiespeed Maybe we're really on to something
Seems like forceGC() which called window.gc() a few times didn't clean the dom nodes for ivi.
If we run window.gc() and HeapProfiler.collectGarbage" the node count goes down:
And even more, the performance is closer to what we'd expect:
And performance doesn't change wildly when a 500 msec delay is added:
Now I'll run the benchmarks above in a rather clean run with the modified forceGC call. I'll report back.
Here's the result with the modified forceGC method:
Looks much more reasonable to me:
- Adding a delay doesn't yield a different order
- Much closer to manual testing
With 1.000 msecs delay the graph looks pretty similar, which is really good news!
Conclusion:
- Somehow calling 7x window.gc() didn't clean up the dom nodes for ivi.
- Running window.gc() and HeapProfiler.collectGarbage in a loop with seven iterations clears the dom nodes for ivi
- With the nodes gc'ed ivi performs similar to chrome 123 and similar to manual tracing (Which leaves one question: Why is ivi much faster when the nodes are no gc'ed? Is there some kind of caching in chrome 124?)
- This means I'm pretty confident we can get reasonable numbers for chrome 124 for all create actions. (There's might be still the remove row issue which starting with chrome 124 gives frameworks with RAF an advantage, but I'll save that for the next days...)
@paulirish Whenever it comes to GC we need your help. Running windows.gc() in a loop was your recommendation the last time. Is this still what you'd recommend? Any idea why HeapProfiler.collectGarbage collects more nodes for ivi than windows.gc() alone?
Update: Just checked two things:
- Indeed with chrome 123 window.gc cleared the dom nodes for ivi
- Increasing the loop count doesn't help when running only window.gc() for chrome 124. HeapProfiler.collectGarbage is needed to gc the nodes for ivi.
Chrome 123 with ivi and 7x window.gc. Node count drops after the gc calls from ~30k to 66. The next create rows call adds ~10k nodes.
Chrome 124 with ivi and 14x window.gc. Node count stays at 30k nodes and rises to 40k rows when create rows is performed:
Chrome 124 with 7x window.gc and HeapProfiler.collectGarbage. Node count starts with 66 (!) and rises to 10k rows:
What's still odd here is that the trace starts with 66 nodes and not with something like 30k to 50k nodes from initBenchmark. I checked all 15 traces for one benchmark run and every trace started with 66 nodes 🤯 . I don't understand that, but 10k at the end makes sense.
@krausest try window.gc({type:'major',execution:'sync',flavor:'last-resort'})
.
This is what the results I get when doing 7 gc with default params ({type:'major',execution:'sync'}
):
vs {type:'major',execution:'sync',flavor:'last-resort'}
:
notice vanilla and ivi swapped position and colour
I'm not entirely sure of the differences of last-resort GC, but it seems to be a more thorough GC pass meant for when the process has reached it's memory limits. I suspect HeapProfiler.collectGarbage
is doing a last-resort GC internally.
gc
options are documented in v8 source