FrameworkBenchmarks icon indicating copy to clipboard operation
FrameworkBenchmarks copied to clipboard

No longer accepting plaintext only frameworks / Limited number of tests mutations

Open NateBrady23 opened this issue 2 years ago • 15 comments

Hi everyone!

As the number of new frameworks submitted to the benchmarks grows, the amount of time it takes to complete a full run does as well. Because of this, we will be implementing the following rules:

  • New frameworks that only implement plaintext will no longer be accepted. Of course, we'd like all frameworks to implement all tests to get a better idea of performance in various areas of the framework, but we expect at least 2 different tests to be implemented. Ideally plaintext or json and one db test.

  • The number of test mutations will be limited to 10. We do not mind if you open up pull requests between runs to try out various mutations for your framework so long as the total number at any given time does not exceed 10.

After the next round, we will ping framework maintainers to make these changes. We will also look to remove tests that are older and no longer maintained.

Thank you!

NateBrady23 avatar Sep 14 '23 14:09 NateBrady23

Rules like these show how popular the project is and I agree with both. On top of it I suggest composite score being calculated per mutation which would offer a quick view of per mutation detail.

fakeshadow avatar Sep 14 '23 14:09 fakeshadow

@nbrady-techempower on the number of mutations I proposed https://github.com/TechEmpower/FrameworkBenchmarks/pull/8055 some time ago but then left it given the community feedbacks. It might be worth a while re-check it

gi0baro avatar Sep 15 '23 17:09 gi0baro

I like it a lot, but it exist a problem long ago. At the moment the framework is removed, all the history of the framework will disappear from the Rounds.

Like I said before the Rounds need to be immutable. For example, in PHP we need to change the name because was php5, after the change plain PHP don't appear in the old Rounds. We have the numbers, and the work done, but don't show in the Rounds.

joanhey avatar Sep 17 '23 17:09 joanhey

One framework to remove: Baratine. The domain baratine.io is not registered to the project anymore (careful, clickbait!), and the github project has last changes 7 years ago (https://github.com/baratine/baratine)

otrosien avatar Oct 26 '23 17:10 otrosien

In reality Baratine is marked as Stripped.

Why not bypass all the stripped frameworks from the runs ??

https://github.com/search?q=repo%3ATechEmpower%2FFrameworkBenchmarks+%5C%22Stripped%5C%22+OR+%5C%22stripped%5C%22+path%3A%2F%5Eframeworks%5C%2F%2F+repo%3ATechEmpower%2FFrameworkBenchmarks&type=code

joanhey avatar Oct 26 '23 17:10 joanhey

In reality Baratine is marked as Stripped.

Why not bypass all the stripped frameworks from the runs ??

https://github.com/search?q=repo%3ATechEmpower%2FFrameworkBenchmarks+%5C%22Stripped%5C%22+OR+%5C%22stripped%5C%22+path%3A%2F%5Eframeworks%5C%2F%2F&type=code

I disagree. In xitca-web Stripped bench is used to avoid polluting the default leaderboard while keep perf tracking of low level system software like OS and lang(and/or program) runtime at the same time. In fact Stripped is a fairly arbitrary category because there are even more unrealistic bench marked as Realistic. Unless there is a unified standard to determine what bench must be Stripped or not it's unfair to bypass them.

fakeshadow avatar Oct 26 '23 19:10 fakeshadow

@fakeshadow Ok. I'm happy that is useful this information.

And about the what need to be Stripped, I think that it's a work of all the devs here, help to clarify the requirements and also to identify the frameworks than bypass these requirements.

joanhey avatar Oct 26 '23 23:10 joanhey

@fakeshadow Ok. I'm happy that is useful this information.

And about the what need to be Stripped, I think that it's a work of all the devs here, help to clarify the requirements and also to identify the frameworks than bypass these requirements.

Unfortunately the meaning of "Realistic" is subjective and from the existing bench code it's clear we have very divided opinions among bench maintainers. Therefore I doubt a common ground can be reached easily. Actually I'm fine with the current configuration where the category is up to the maintainers to decide. When people look into the code and figure it out they would know which framework and it's community share the same opinion. In other word as long as stripped bench can run in non official bench I personally find it's fine. As for broken(or outdated) bench I believe we can use “broken” tag to stop them from hogging resources in runs.

fakeshadow avatar Oct 27 '23 08:10 fakeshadow

one thing i have been thinking is not quite fair is to combine results from different framework mutations together into the composite score. surely composite score should reflect a single configuration and that configuration's performance across all benches?

for example, if we look at ntex, which was top of the last official round, the different flavours get wildly different scores across the different benchmarks. is it fair to pick the best mutation in each category and combine those for composite? is it even possible to run a single service on ntex which would score highly across all benches? it doesn't seem so, but this is surely what the composite score should be measuring?

maybe a better system would be to sum up the scores across all benchmarks for a particular mutation and then, for each framework, choose the mutation that got the best composite score?

maybe this has been raised before. sorry for bringing it up again if so.

billywhizz avatar Dec 08 '23 06:12 billywhizz

one thing i have been thinking is not quite fair is to combine results from different framework mutations together into the composite score. surely composite score should reflect a single configuration and that configuration's performance across all benches?

for example, if we look at ntex, which was top of the last official round, the different flavours get wildly different scores across the different benchmarks. is it fair to pick the best mutation in each category and combine those for composite? is it even possible to run a single service on ntex which would score highly across all benches? it doesn't seem so, but this is surely what the composite score should be measuring?

maybe a better system would be to sum up the scores across all benchmarks for a particular mutation and then, for each framework, choose the mutation that got the best composite score?

maybe this has been raised before. sorry for bringing it up again if so.

I agree with you on the composite score issue. Besides incompatible features it's a common practice in the bench that frameworks implement low level json and/or plaintext to boost their composite score which is questionable to say at least.

Speaking of ntex from what I see the current bench has to choose one async runtime which means it's tokio or async-std flavor scores can't be achieved at the same time. That said it's possible to modify the code to combine multiple runtimes and get the best of them which would be a big refactor but it can be done.

fakeshadow avatar Dec 08 '23 14:12 fakeshadow

Should we remove frameworks like gnet? It only implements plaintext and isn't actually doing any parsing / routing - it just scans to the \r\n\r\n and sends a canned response which doesn't meet the test requirements.

MarkReedZ avatar Mar 26 '24 05:03 MarkReedZ

@MarkReedZ , your project also has bugs: https://github.com/TechEmpower/FrameworkBenchmarks/issues/9055

remittor avatar May 24 '24 17:05 remittor

@remittor The point is to report issues, not to personally attack people

sgammon avatar Jun 05 '24 20:06 sgammon

@sgammon

The point is to report issues, not to personally attack people

Where do you think I went from reporting bugs to making personal attacks?

Wow look at the big brain on Oleg

Your message https://github.com/TechEmpower/FrameworkBenchmarks/issues/9055#issuecomment-2150948769 is more suitable for attack than my results of @MarkReedZ code review

remittor avatar Jun 07 '24 03:06 remittor

@remittor

Should we remove frameworks like gnet?

This is a statement about a project

your project also has bugs:

This is a statement that is personal

Your message [...] is more suitable for attack

True!

sgammon avatar Jun 07 '24 07:06 sgammon