FrameworkBenchmarks Changes / Updates / Dates for Round 21

We would like to set our eyes on a new official Round release. Because there have been some rules clarifications, we're going to need the community's help in identifying frameworks whose maintainers need to be pinged for updates so they can be included in the official results.

We'll aim for mid-to-late May for an official preview run. I'll add dates here with a couple week's notice before locking PR's.

Rules Clarification:

#7019
- If any pipelining or network optimization is performed, then the execution behavior and semantics must match what they would be if the multiple SQL statements were executed without the optimization, i.e. in separate roundtrips. For example, it is forbidden to perform any optimization could change the transactionality and/or error handling of the SQL statements. To avoid any doubt, such optimizations are forbidden even if they wouldn't produce the behavioral change with the specific SQL statements of the TechEmpower benchmarks, but would produce it with other statements a user might supply.

Dates:

All PRs must be in by Tuesday, May 31st to be accepted for Round 21. After that date, only PRs that address issues we've called out will be accepted. No performance updates/improvements, package updates, etc, will be accepted after this date.

Apr 11 '22 16:04 NateBrady23

Because there have been some rules clarifications

Could you clarify exactly the changes. Also will be good open a discussion with the changes and the not approved changes. And not pollute this issue.

Thank you !!

Apr 17 '22 13:04 joanhey

Thanks @joanhey - It's really just the db driver clarification (which currently only affects postgres). I updated the issue and will continue to as needed.

Also will be good open a discussion with the changes and the not approved changes. And not pollute this issue.

There's no need to open discussions about not approved changes. I won't really be monitoring that. It's fine to post here if you need to. People should only care about the pinned post, but it's the easiest way to get a notification to my inbox, since I'm extremely busy with other work atm.

Apr 18 '22 15:04 NateBrady23

we're going to need the community's help in identifying frameworks whose maintainers need to be pinged for updates

I see two options, not exclusive:

Ping contributors to verify the implementation
Start manually testing the top frameworks in the database benchmarks

We could have a table to keep track of each framework and the verification status. If contributors are not sure, then someone could help them with manual verification (I volunteer). Which also means document the verification steps.

May 02 '22 19:05 sebastienros

Question: In the Fortunes test, is It possible? to :

Cache the response #6529
By pass the template engine #6883

Because some frameworks have more or similar req/s in fortunes than in 1-query. That is very odd.

If yes, I will update some frameworks before the next round.

May 02 '22 20:05 joanhey

We could have a table to keep track of each framework and the verification status. If contributors are not sure, then someone could help them with manual verification (I volunteer). Which also means document the verification steps.

I'm also happy to help with manual verification. Let me know if/when that's needed. I'd probably use the process I described in this PR comment.

I did do that verification on most of the top frameworks already but I didn't record the results anywhere. The gist was that the frameworks in the tier above vertx in multiquery all failed verification, but those in the same tier were ok.

May 12 '22 21:05 michaelhixson

The database conneciton is OK @fakeshadow could you explain why it's faster fortunes than 1-query test. Thank You.

May 13 '22 16:05 joanhey

could you explain why it's faster fortunes than 1-query test. Thank You.

https://github.com/TechEmpower/FrameworkBenchmarks/blob/e49fddca6c27701586f4afaa98db4908abf0b976/frameworks/Rust/xitca-web/src/db.rs#L38

You can see the source code here. query statements is cached for fortune test and it's just a single query. For single query test you have to type check the input id and properly encoding it which is more work to do that could very well be slower than a sort.(xitca-postgres acts the same as tokio-postgres in this regards. It uses a lock on client encoding buffer to help potential memory re-use and less allocation which make it relative slower on the encoding part.)

Edit: I forget to add that there is also extra cost of parsing the query string to a number from uri path in single query test.

May 13 '22 17:05 fakeshadow

but the code cannot have foreknowledge of the table's size

It's not only a sort, it's also to be properly escape all the rows.

But the important think is now how fast is the fw, but a more tecnicallly question it's the response from the db and to the request, is larger.

But if you explain, the rest will learn. Thanks

May 13 '22 17:05 joanhey

It's actually an interesting topic. The top scorers in single query are using batch mode and they have a significant perf drop in fortune test. I suspect it has to do with io read overhead by batching on large response.

May 13 '22 17:05 fakeshadow

Yes all, but not all fw

May 13 '22 17:05 joanhey

The response in Xitca-web fortunes is cached ?? Sorry but I need to ask it !!

May 13 '22 17:05 joanhey

The response in Xitca-web fortunes is cached ??

No. You can see the source code here. https://github.com/TechEmpower/FrameworkBenchmarks/blob/e49fddca6c27701586f4afaa98db4908abf0b976/frameworks/Rust/xitca-web/src/ser.rs#L55

Fortune take ownership of the data it owns. And in Rust when ownership goes out of scope all memory it associated with is dropped.

May 13 '22 17:05 fakeshadow

Sorry I don't want start a war :) I know how work Rust. And that allways clean the memoryl

In the top 30 frameworks, the issue is not just speed, but the size of the network response. And that is technically impossible, than Fortunes is faster than 1-query.

May 13 '22 17:05 joanhey

Sorry I don't want start a war :) I know how work Rust. And that allways clean the memoryl

That's a legit question and I understand your concern. But the result is what it is. I'm pretty sure I follow all the rules in this bench and the whole project is open sourced on git. If you find anything I did wrong just ping me and I would fix it if that's the case.

May 13 '22 17:05 fakeshadow

@nbrady-techempower Hi! We (quarkus team) plan to send a PR to upgrade to the latest version (and other changes), but

We'll aim for mid-to-late May for an official preview run

which date exactly? many thanks!

May 16 '22 07:05 franz1981

@franz1981

which date exactly? many thanks!

No date, exactly. As usual, we're pretty backed up with client-facing work, but I'd get it in as soon as possible. I'll talk to the team today, but I'd say we'll shoot for closing PR's by the end of next week.

May 16 '22 14:05 NateBrady23

Hi everyone! Very excited to get Round 21 out the door. I'm out on Friday and a US Holiday on Monday, let's close PR's for Round 21 on Tuesday, May 31.

Everything opened before then will be QA'd and merged for the upcoming round. This round might take a few extra preview runs than normal as we have to identify some frameworks that need updating to comply with the rules before we perform the run for the round.

May 23 '22 15:05 NateBrady23

One is Justjs But there are more. They break https://github.com/TechEmpower/FrameworkBenchmarks/issues/7019

This is the first test where Just(js) has quite a big lead. This is likely due to the fact it is using a custom postgres client written in Javascript and taking full advantage of pipelining of requests. It also avoids sending a Sync/Commit on every query. As far as I am aware this is within the rules but will be happy to make changes to sync on every query if it is not.

https://just.billywhizz.io/blog/on-javascript-performance-01/ Still we need to learn from this very good explanation. Thanks @billywhizz

@nbrady-techempower Also I wait for a response from the response cache in Fortunes. Last week I fixed some frameworks, because they had 1 char less in the Fortunes URL path. But any framework can cache the Fortunes response ?? It's only to add it in the rules, to not cache the response. Thanks

May 26 '22 14:05 joanhey

author of just-js here.

thanks for pinging me @joanhey. i would have missed this deadline if you hadn't! :sweat_smile: i should be able to get a PR ready by monday to fix this issue in current just-js release and might also be able to upgrade to latest just-js framework and libraries.

i did some testing today against the top performing frameworks and, as @michaelhixson mentioned above, it seems there are still a number that will fail to meet the new requirements - i am not sure how to ping the authors so maybe @nbrady-techempower or @michaelhixson can reach out to them so they have a chance to make changes before the deadline.

here is what i found when i sniffed what was being put on the wire for the latest master branch. i should be able to test some more frameworks tomorrow once i have a PR ready with fixes for just-js. Screenshot from 2022-05-28 19-01-17

i still feel what just-js currently does should be allowed for multi-query and update tests - it appends a SYNC for batch of queries in every http request but it seems this won't be allowed according to new rule. :cry:

just to be clear in case folks aren't understanding the new requirement:

every Bind and Exec command on the wire must be followed by a Sync command for all benchmark tests against postgres

here is what correct and incorrect behaviour looks like in terms of what we should see on the wire. there's a lot more detail in the very long debate we had a few months ago,

pass

BIND
EXEC
SYNC
BIND
EXEC
SYNC
...

fail

BIND
EXEC
BIND
EXEC
BIND
EXEC
...
SYNC

on another note - @nbrady-techempower @michaelhixson would it be possible going forward to just cut and publish a new round of the benchmarks monthly or on some regular schedule? it seems this wouldn't be too much work on your end as the tests are already running continuously and results are generated automatically - i am happy to help with effort in any way i can if you don't have time to work on this. the last round was more than a year ago and i think would be useful for folks to have them published on a regular, known schedule. imho, it doesn't need a writeup or anything like that for each published. round. happy to raise this as a separate issue if you think would be a good idea.

May 28 '22 18:05 billywhizz

@billywhizz Thanks for looking into this and for the additional work looking into the other top frameworks.

Thanks for your comment about regular rounds. It's something we've always wanted to do, but it does require a lot of work on our end. The compromise we made with this years ago is the continuous reporting on https://tfb-status.techempower.com Additionally, we underwent some changes within the company that also delayed things this year. Ideally, I'd like to see a round released every quarter, but that also includes doing the due diligence of checking the top performers during several preview rounds before we release as "official" so in that sense, you're definitely already helping, and it's much appreciated!

May 29 '22 00:05 NateBrady23

@joanhey

Because some frameworks have more or similar req/s in fortunes than in 1-query. That is very odd.

just to point something out on this - it is actually possible that fortunes RPS could be greater than single query. if database is the bottleneck and we have spare cpu cycles on the web server and don't saturate the network interface, then fortunes could run faster than single query. this may be to do with fact that single query has to provide a parameter which adds load on the outbound connection to database and possibly requires a little more work inside postgres to parse and execute. fortunes is just a simple "select *" without a where clause.

this tool is useful for seeing where cpu is not fully utilized on the web server, indicating some other bottleneck, likely network saturation (for plaintext) or database overload.

May 29 '22 03:05 billywhizz

@fafhrd91 just an FYI - i noticed on latest techempower run that ntex has increased perf considerably in single query and fortunes tests - from 484k RPS to 729k RPS and from 418k RPS to 602k RPS.

it looks like this improvement is due to a recent upgrade of the ntex framework in TE repository. This upgrade pulled in an updated Rust postgres library which you seem to maintain.

The reason it's performing much faster now is because the sync on every command that was happening has been removed and it seems now it only syncs on every 50 commands per connection.

I just wanted to point this out because, as you can see in the discussion above, this behaviour will no longer be allowed and you will need to change it if you want to be included in the next official run (please correct me if i am wrong on this @nbrady-techempower).

also pinging maintainers of following frameworks which i have checked - you will all need to submit PRs in order to be compliant with new rules too as far as i can see.

lithium: @matt-42
drogon: @an-tao
ffead-cpp: @sumeetchhetri
redkale: @redkale

May 30 '22 00:05 billywhizz

I just wanted to point this out because, as you can see in the discussion above, this behaviour will no longer allowed and you will need to change it if you want to be included in the next official run

That's correct. Preview runs will be starting this week and I'll be removing tests that aren't resolved. PRs to add those tests back will need to address the issues above.

May 30 '22 01:05 NateBrady23

Seems to be a regressive move!! Moreover the same optimisations for the postgres client are equally available to all to integrate within their frameworks. Also it would have been appropriate to have provided a full cycle of tests (Round 22) to target these modifications in the requirements (Postgres Sync Chanage -- Updated on March 22).

May 30 '22 05:05 sumeetchhetri

hi @sumeetchhetri. i had a very long debate with the TE folks and others who proposed this change but the consensus seems to be that it's necessary in order to ensure frameworks are using the most common, safe patterns rather than optimising for the specific tests in the benchmark suite. i can understand that POV even i don't wholly agree with it. :man_shrugging:

@nbrady-techempower is there any leeway to extend the deadline a little considering the fact quite a few maintainers, including myself, are only now finding out about it? if not, could you clarify when the exact UTC time is for PR's to be submitted for the cutoff? thx.

May 30 '22 05:05 billywhizz

Here is what I understand right now: Only in the update test, we cannot batch updates commands since all of them will be rolled back in anyone fails (this is what lithium-postgres-batch is doing). Batching the select queries of all tests is accepted since there is no concepts of transactions for selection.

@billywhizz what I understand from you post is stricter: pgsql batching is forbidden on all tests for all queries (lithium-postgres already complies with this).

So at the end I'm not sure to understand the new rule. Could you @nbrady-techempower make it clearer by saying explicitly in which tests and on which request pgsql batching is forbidden ?

Thanks !

May 30 '22 10:05 matt-42

@matt-42

@billywhizz what I understand from you post is stricter: pgsql batching is forbidden on all tests for all queries (lithium-postgres already complies with this).

yes - this is the rule that is being enforced as i understand it - not my decision and not something i agree with (i would be fine with batching of syncs within the context of a single http request) but this is the new rule as explained above.

also, from my testing, lithium breaks this rule on all tests currently on master branch so will be excluded from the next official round unless you change it to have an explicit sync for every single query that is sent to postgres. i.e. postgres "pipelining" is not allowed on any of the tests.

@nbrady-techempower should be able to confirm this is the case. :pray:

at this stage there seems to be so much confusion that, given the amount of effort maintainers put into these benchmarks, i would suggest postponing this deadline and giving maintainers who were not aware of the deadline or misunderstood the new requirement the opportunity to make changes so they are not excluded from the next official round.

May 30 '22 10:05 billywhizz

actually it seems that fixes will be be allowed after the deadline from what @nbrady-techempower said above:

After that date, only PRs that address issues we've called out will be accepted. No performance updates/improvements, package updates, etc, will be accepted after this date.

so the deadline tomorrow seems to be only for changes/updates that are not related to failures caused by the new rule. is that correct @nbrady-techempower? if so, can you please clarify the exact deadline for updates/PRs as I would like to submit a PR before this deadline with latest just-js framework and libraries. is it 23:59 on 31 May 2022 UTC?

May 30 '22 10:05 billywhizz

Just to clear the confusion on updates test (@matt-42 and @billywhizz), if you are within a transaction boundary the SYNC is (if not, should be) surely allowed (libpq pipeline_mode), PIPELINE_START;BEGIN;UPDATE 1;UPDATE 2;.....;COMMIT;PIPELINE_SYNC should be fully valid and work as expected with a single transaction block. I just looked at the source for ffead-cpp and it follows this principle for Updates test. Now coming to the select tests, specifically for Multiple selects an ending SYNC should never be a concern as it is a single unit when viewed externally. The only change as far as i'm concerned would be needed on the single-query/fortunes tests which needs to be excluded from the batching process.

May 30 '22 11:05 sumeetchhetri

@sumeetchhetri this is not my understanding - as far as i understand it the multi query and update tests should also have an explicit sync (as if autocommit was on) wrapping every individual query and update statement. if you have the patience to read through the debate thread there is much discussion of this and disagreement from me on why it is necessary, but there you go. :man_shrugging:

you can just follow from here where i ask the same question you are asking now and @michaelhixson gives a pretty clear response. sure, that's frustrating and i sympathise with your position but at same time i can understand the desire on behalf of TE to benchmark the most common real world usage of the database libraries and not to allow optimizations that are tailored specifically to the TE tests as currently specified.

so, again, in all tests this will not be allowed on the wire:

BIND
EXEC
BIND
EXEC
...
SYNC

i.e. every bind and exec on the wire must be followed by a sync.

May 30 '22 12:05 billywhizz

Seems like as it is, people can have different interpretation of this new rule. Can we make it more concrete and less subject to interpretation ? If batching is forbidden, why not just just saying that posgresql batching is forbidden ?

May 30 '22 12:05 matt-42

First, I want to make sure everyone understands that this isn’t a “new rule.” This was always the intention, but we weren’t clear enough. @billywhizz has a good understanding of the updated text and I encourage everyone to read the thread where @michaelhixson states pretty clearly our position. @matt-42 I’m not sure what different interpretations follow from the updated rule after revisiting that thread.

Do we need to add:

If using PostgreSQL's extended query protocol, each query must be separated by a Sync message.

for it to be clearer? I’m getting the sense that everyone here understand the updated text but may disagree with it. I don’t want to rehash that here (but don’t want to discourage more conversation about it for the future, so please continue that discussion on another thread.)

@billywhizz 2 weeks from Tuesday; we generally do about 2 weeks of preview runs. We only put a hold on updates/optimizations during preview runs (as they could always introduce - intentionally or not - more misinterpretations of the rules.) So, yes, PRs to address issues with meeting test requirements will be accepted.

May 30 '22 13:05 NateBrady23

@nbrady-techempower thanks for the clarification - hopefully my attempt at explanation didn't confuse things further! =)

sorry to bug you further but does Tuesday mean I can submit a PR with changes to my framework by end of today or end of tomorrow (Tuesday)?

May 30 '22 13:05 billywhizz

@billywhizz End of day tomorrow is fine. The first preview run will start in ~30 hours according to tfb status

May 30 '22 13:05 NateBrady23

Do we need to add:

If using PostgreSQL's extended query protocol, each query must be separated by a Sync message.

Yes, I guess this will help others to understand without having to go through the long debate. You can even add: Pgsql query batch/pipeline mode is not allowed. This will be clearer for those who do not know about the pgsql wire protocol.

I did not want to discuss this rule, just wanted to make sure everybody is on the same page. Looking forward to see round 21 released !

May 30 '22 13:05 matt-42

@nbrady-techempower and @michaelhixson, do we have a definitive list of all frameworks using postgresql (pipelining/batching) directly or indirectly, there are multiple frameworks who are relying on external libraries (rust/java/.net/possibly-other-langs) which are internally relying on pipelining.

Who would we be relying on to provide such a list, do we know what it takes to find all the offenders to this new rule?

IMHO the first step should be to, clearly identify the frameworks not adhering to the new spec, probably once such a list is officially available then we would be in a position to get this change ready for Round 21.

Until there is such a list this effort where we just look at the top 5-6 frameworks does not look good. Im all for a level playing field, let's get rid of postgresql pipelining completely and please first let's ensure that everyone falls in line before an official round, until there is such a list Round 21 should be postponed indefinitely.

May 31 '22 06:05 sumeetchhetri

We have the ability to test frameworks manual pretty easily, it's just not built into the verification suite. I believe @billywhizz has also offered to help. Quite frankly, I'm not interested in spending much time worrying about frameworks with 1/10th of the performance of the top frameworks. We know they're not doing optimizations like this.

IMHO the first step should be to, clearly identify the frameworks not adhering to the new spec, probably once such a list is officially available then we would be in a position to get this change ready for Round 21.

You can make changes to your framework without such a list. Maintainers that pay attention to the benchmarks know whether they're doing this or not. We are not postponing Round 21 "indefinitely" and will start removing tests that don't adhere to this rule as preview runs start coming out. Please move additional discussion about Postgres pipelining to a new issue.

Edit: This is the case with all rules. Sometimes people slip by us. We don't have the manpower to go digging 5 libraries deep in every framework, in every test permutation. We rely on the community to also help identify. Our automated verification tool doesn't catch everything. We'd never release a round if that were the case (some argue it already feels like never). In the past, we've also removed results after the fact.

May 31 '22 06:05 NateBrady23

Just created a PR removing all lithium tests using pipelining: https://github.com/TechEmpower/FrameworkBenchmarks/pull/7382

May 31 '22 14:05 matt-42

https://github.com/TechEmpower/FrameworkBenchmarks/pull/7383

May 31 '22 15:05 sumeetchhetri

i should have a PR ready in a couple of hours with latest just-js release and libraries with pipelining disabled on all tests too. hope that is ok @nbrady-techempower! if not, i can do a quick PR for the current (very old) version of just-js that is in TE repo but i'd prefer to get the newest one in for round 21 if i can.

May 31 '22 16:05 billywhizz

That's fine @billywhizz! We're in the middle of a run right now. The first preview run is up next.

May 31 '22 16:05 NateBrady23

PR #7384 for just-js which does not have pg pipelining enabled is now ready.

May 31 '22 22:05 billywhizz

The final preview run is happening now on tfb-status: Run ID: dbde77c0-3a66-491d-9698-8b075e29baa8

Ignore any run that automatically starts after this. I'll be stopping the run, spending some time to make sure all frameworks that aren't adhering to the rules clarifications are removed, and then starting the Round 21 run manually.

Thanks for your patience, everyone!

Jun 16 '22 15:06 NateBrady23

Please ping me when round 21 will be available on the website! Quite excited. 😅

Jun 17 '22 05:06 ShreyasJejurkar

There may be an error in the current preview for php, compared to round 20: on the "framework" filter tab, "php" is listed as a framework. php isn't a framework. raw php shall appear when we select "none" in the frameworks filter.

I think this is important because this filter allows to display back-end stacks that do not rely on a framework, which is an additional dependency which might be unwanted for complex projects.

The presence of "php" in this place may lead people to think that php is a framework which is not the case. Thanks.

Jun 24 '22 10:06 AadaEa

I added the PHP name in the benchmark_config.json, without it don't appear in the composite scores. Also added nodejs, go, ... So you can compare the frameworks vs platform in the composite scores.

But before already appeared platforms in the framework filters: asp.net core, justjs, cfml, actix, ....

If you want only see platforms, you can do it in the filters panel.

And you will see a lot of platforms in the frameworks category.

Will be good if asp.net core, separate the names in the config, now we can only see the plaform results in the composite score. And we can't compare asp.net frameworks (middleware, mvc, ...) overhead vs asp.net

Jun 25 '22 11:06 joanhey

Official Round 21 Run has completed.

If anyone here has the time to review, please do. FAF lands 8.6m rps again in plaintext. I know folks are digging into this perceived issue, but unless there's a conclusion, I'm going to post the results as is. I may add an asterisk with a link to the open issue about FAF surpassing the theoretical limit. I'll discuss with the team.

Jul 07 '22 13:07 NateBrady23

I added the PHP name in the benchmark_config.json, without it don't appear in the composite scores. Also added nodejs, go, ... So you can compare the frameworks vs platform in the composite scores.

But before already appeared platforms in the framework filters: asp.net core, justjs, cfml, actix, ....

If you want only see platforms, you can do it in the filters panel.

And you will see a lot of platforms in the frameworks category.

Will be good if asp.net core, separate the names in the config, now we can only see the plaform results in the composite score. And we can't compare asp.net frameworks (middleware, mvc, ...) overhead vs asp.net

Thank you for your answer. In this case, maybe the "Framework" category shall be renamed to something else, because it can be misleading to have non frameworks appearing in a menu labeled as "Framework". Maybe "Framework or Platform" or "Framework/Platform". Maybe ideally there shall be one different frame per "Classification" but that would probably be too much work for little benefits.

Jul 07 '22 15:07 AadaEa

@nbrady-techempower I have noticed that the rust/ntex framework had not been appearing in the unofficial results since early March (about 4 months ago) and it did not build/run for the run which became Round 21. It turns out this was due to a aconfig error in a commit to this repo back then which evidently errored out the build process very early on in each run -- since ntex did not even show up in the "Did not complete" status at the bottom of each result page for any of those runs.

@fafhrd91 has submitted a PR to this repo (in the past 24 hours) at https://github.com/TechEmpower/FrameworkBenchmarks/pull/7439 to address this problem. Is there any possibility once this is merged that an additional run could be arranged for the official "Round 21" results so that the TechEmpower Benchmarks will not be missing results for this popular framework? As a reference, it had managed to score 4th overall in the Composite results in Round 20.

(note -- I am just a public user of this web framework; I have no affiliation with the development team) [edits for clarity]

Jul 07 '22 15:07 mkvalor

PHP Symfony was working without problems, till the official run. After test it locally also failed. I fixed it in #7464 Symfony is a popular PHP framework, that need to be in the tfb results (If possible).

Jul 07 '22 19:07 joanhey

Thanks everyone. Unfortunately, we just had to bring the machines down. Emergency maintenance in the IT room. I'm not sure when we'll be back up.

I don't think we want to set the precedent of doing additional runs just because a fw or 2 failed. We could do another 6-day run and then a different popular framework might fail, so we're going to leave as is. Unless the results are off across the board for some reason, we're going to go with this run.

What we can do is get Round 22 out in just a few months.

Jul 08 '22 00:07 NateBrady23

tfb-status / citrine are still experiencing some technical difficulties. It may not be resolved this weekend.

Round 21 results will be officially posted on Monday. I'll open a new issue for Round 22 which I would like to publish in Oct/Nov.

Jul 16 '22 19:07 NateBrady23

@nbrady-techempower

I value this benchmark and don't want to cause any suspicions about the results. As such, as the author, I am asking that FaF be excluded from this round of official results as I mentioned in #7402 until we have a satisfactory explanation.

Jul 17 '22 22:07 errantmind

@errantmind I appreciate that very much. No problem.

Jul 18 '22 14:07 NateBrady23

Other people are making the tricks, and say nothing. Thanks @errantmind Still we need to be clear about that https://github.com/TechEmpower/FrameworkBenchmarks/issues/6967#issuecomment-1000261336

But I hope the round 22, will be more correct. We need to open a discussion about that, for the next round.

Jul 18 '22 15:07 joanhey

We need to clarify more the rules. @nbrady-techempower

Jul 18 '22 15:07 joanhey

@joanhey i think the previous suggestion to add a random number on each request to the extra row for fortunes test would be a good one to avoid any caching of results. maybe we should create an issue with suggestions for further tightening of the rules so they are all in one place?

there are also some changes that could be made to make it easier to automatically verify compliance with the rules - when i tried to do this across many frameworks for the pipelining it was v difficult without manual work due to the current structure of the requirements and the various tricks different frameworks get up to.

in doing this work i noticed a number of frameworks which warm up their caches (this is currently allowed, but i don't think it should be) before tests start and also ones that run profiling before tests start and re-compile themselves based on profiling information. not sure that should be allowed either. it also makes it more difficult to have any rigour in verifying expected number of db requests against actual as tests are run.

Jul 18 '22 16:07 billywhizz

I'm not sure if it's intended but the link from the Round 21 page that says it is going to the Blog is actually pointing here. I would very much be interested in a summary of results/changes as was done in the past. I'm not currently seeing that in the blog, which is still at round 20.

Jul 22 '22 14:07 rcollette

@rcollette It was intended. The only real changes for this round was the rules change listed atop the thread. In the future, we'd like some maintainers to write a small blurb about things they did/encountered when preparing for the next round. For now, there won't be a blog post.

Jul 22 '22 14:07 NateBrady23

In the future, we'd like some maintainers to write a small blurb about things they did/encountered when preparing for the next round.

I certainly look forward to that type of blog post in the future.

Jul 22 '22 23:07 bhauer

... ones that run profiling before tests start and re-compile themselves based on profiling information. not sure that should be allowed either.

I am pretty sure you mean h2o here - banning this behaviour would give the dynamic languages that run in a JIT runtime (e.g. a JVM) an unfair advantage because most of them do exactly the same thing, except that it is transparent to the user.

Jul 23 '22 14:07 volyrique

I am pretty sure you mean h2o here - banning this behaviour would give the dynamic languages that run in a JIT runtime (e.g. a JVM) an unfair advantage because most of them do exactly the same thing, except that it is transparent to the user.

i was thinking of lithium in particular as that was the one i had noticed doing this - am sure there are others too. https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/C%2B%2B/lithium/compile.sh#L22

i don't think it's an "unfair" advantage when this is a natural advantage JIT has in the real world and is a possible reason to choose JIT over static, depending on the workload. what's the point in the benchmark if it doesn't give us an insight into which languages and platforms might have advantages over others because everyone has heavily optimised their entry to fit the specific tests?

Jul 23 '22 15:07 billywhizz

i don't think it's an "unfair" advantage when this is a natural advantage JIT has in the real world and is a possible reason to choose JIT over static, depending on the workload. what's the point in the benchmark if it doesn't give us an insight into which languages and platforms might have advantages over others because everyone has heavily optimised their entry to fit the specific tests?

I suppose you have done that by accident, but what you wrote might give the impression that you consider feedback-directed/profile-guided optimization for static languages to be somehow not a real-world optimization and to be akin to "coding to the benchmark" - it is not. It is a well-understood generic optimization mechanism, and there is work to make it applicable to an even wider range of use cases (e.g. AutoFDO). In fact, the relative ease with which both h2o and lithium deploy it supports that, and, even better, there is an argument to be made that FDO/PGO as done by them makes more sense than the one done by a tiering compiler in a JIT runtime because it obtains a profile across the whole set of TechEmpower benchmarks instead of the currently running one (an imperfect fix for a JIT runtime would be to run at least the warm-up phases of all benchmarks in parallel instead of sequentially, as is right now), so it would be more likely to produce a result that is not adapted to a particular benchmark, but is a balanced binary that has predictable performance for everything.

And yes, obviously I disagree that this is a "natural" advantage of JIT runtimes. The real advantages are:

FDO/PGO is (way) easier to deploy, even when compared with AutoFDO
JIT runtimes tend to build optimizations on top of it that static language compilers ignore, e.g. speculative devirtualization/monomorphization

In particular, the second advantage would be perfectly well reflected in the results even if FDO/PGO is allowed for languages that are compiled ahead-of-time.

Jul 23 '22 16:07 volyrique

@volyrique i'll leave it up to the TE folks to decide but there's much confusion about what these benchmarks are for and there seems to be a perception out there (on HN/twitter/reddit) that they are rendered meaningless/ridiculous by the extreme lengths maintainers go to in order to achieve the top scores.

the reality i think is only a tiny fraction of devs out there are interested in those kind of micro-optimizations and would prefer to see a realistic comparison of standard implementations handling a range of different workloads without specific optimizations. i myself was on the other side of this debate in wanting to see the more extreme optimizations and understand what would be possible if we optimized everything we could, but i have been won over to the realistic side of the argument since.

it might be best to have two distinct categories for "realistic" and "optimized" and have two completely different rankings for them? it certainly seems the status quo is too difficult to police and leaves too much room for "gaming" the rules.

Jul 23 '22 17:07 billywhizz

the reality i think is only a tiny fraction of devs out there are interested in those kind of micro-optimizations and would prefer to see a realistic comparison of standard implementations handling a range of different workloads without specific optimizations.

This is a false dichotomy - there are certainly generic micro-optimizations, and PGO for the C programs in this repository tends to gravitate towards this category IMHO.

it might be best to have two distinct categories for "realistic" and "optimized" and have two completely different rankings for them?

I fail to see how that is going to help with the policing issue - it is just changing labels, unless I am missing something. Also, we kind of already have the same thing with the stripped implementation approach.

Jul 23 '22 18:07 volyrique

Is there an easy way to click-through and see which versions of frameworks were used, which serializers, etc...?

Aug 26 '22 07:08 inemtsev

FrameworkBenchmarks FrameworkBenchmarks copied to clipboard

Changes / Updates / Dates for Round 21

Rules Clarification:

Dates:

pass

fail

FrameworkBenchmarks
FrameworkBenchmarks copied to clipboard