FrameworkBenchmarks TFB Results Website Update

#9740 Brought to light some issues with the results website to which we, frankly, did not have a good resolution.

The extremely high-level explanation of what has changed is as follows:

Rounds > 22 will have their own metadata instead of trying (and failing) to maintain, merge, and adjust from round to round. This has been an enormous pain point with the results website for years. I rolled back the metadata to what it was as of Round 22, then implemented Round 23 with the new functionality - these should exist in isolation now (so a new round won't break the previous round's results).
Rewrote the entire webapp from a Svelte4 port of a bespoke html/js app to a Svelte5 port of that same bespoke html/js app.

Our goal is to eventually open-source the results website so contributors can open issues and pull requests for new features. We are now a step closer to that goal.

However, I'm sure that I've broken things. There will be some edge cases I'd simply forgotten about, or some way the filters used to work, or something. Consider this issue the place to report these types of things for the next couple of weeks. I will be addressing ad hoc issues and reporting here, as well.

Apr 07 '25 19:04 msmith-techempower

Thanks for working on this 👍🏼 Will there be a composite score as well? I use it a lot to see if changes are an improvement.

Apr 08 '25 08:04 p8

The composite scores required the legacy metadata merging strategy, so no there will not be composite scores moving forward unless we can come up with some other strategy that uses the isolated metadata approach, which I believe to be the better approach on the whole.

You can't actually see them anymore, but the composite scores had basically been broken for 3 Rounds because most of the tests identified did not pass the requirements to be considered and used... it was basically aspnetcore and maybe a Rust framework (I honestly don't remember).

Apr 08 '25 15:04 msmith-techempower

Would it make sense to make the fortunes tests the first tab, as it seems to be the default when opening the results.

Apr 08 '25 16:04 p8

I'm not sure the order matters.

We made the Fortunes test the default tab because the opinion, at the time, was that it represented the most real-world example for what a webapp framework would be providing - templating, trivial retrieval, serialization... you get it.

Personally, I think things have changed and shifted. If I were a developer making decisions about a tech stack based on performance as my first key metric, I am probably going to be looking at Queries and Json tests and making sure it scores high on those. My next metric I would factor in is developer ergonomics.

Maybe I'll make it select a tab at random on page load 😆

Apr 11 '25 16:04 msmith-techempower

The composite scores required the legacy metadata merging strategy, so no there will not be composite scores moving forward unless we can come up with some other strategy that uses the isolated metadata approach, which I believe to be the better approach on the whole.

I’m probably misunderstanding how the composite score was calculated, but I thought it was the sum of all test scores multiplied by a weighting factor. Something like:

t1*f1 + t2*f2 … tn*fn

And if there are test variants (for example a different database) the highest score is used. (There is an open issue that states this isn’t fair and the best composite score should be used instead. )

Apr 11 '25 19:04 p8

This languages benchmark visualization has a Champions option, which only shows the fastest variation for a language: https://pez.github.io/languages-visualizations/ Would it be possible to add this to the TFB results per framework? There are a lot of variations currently shown. Reducing the variations to the fastest only would make things easier to compare.

Apr 15 '25 13:04 p8

Do we really mean "language" here or do we mean "reduced to versus"?

Both are doable, but I guess you're specifically asking to have like 1 Ruby test, 1 Python, 1 Rust, etc. in your view. I guess this could be another filter to toggle.

Apr 15 '25 15:04 msmith-techempower

The first time that I read here composite score, I was thinking with #9391. But it was very different, and I liked it also this approach, but it was not well implemented.

May 18 '25 23:05 joanhey