Programming-Language-Benchmarks Make benchmarks more precises and complete?

Hi,

I was interested by java vs csharp comparison.

I was quite surprised by the exposed results - in particular the bintree one where dotnet looks insanely slow - so I tested it locally. I don't get exactly the same results (basically I fall into the error zone). Here is what I did:

get 2.java, copy/paste it in a 1.cs (and fix the language but not the branching/code path)
compile both (in release mode and aot mode using graal for java)
run both with time

Here are my results (did multiple runs and it moves from ~20% since the execution is very fast):

/tmp/test $ time java app # java 22 standard mode
stretch tree of depth 7	 check: 255
64	 trees of depth 4	 check: 1984
16	 trees of depth 6	 check: 2032
long lived tree of depth 6	 check: 127

real	0m0,070s
user	0m0,073s
sys	0m0,018s
rmannibucau@rmannibucau-yupiik:/tmp/test $ time ./bin/Release/net8.0/test # dotnet standard mode
stretch tree of depth 7	 check: 255
64	 trees of depth 4	 check: 1984
16	 trees of depth 6	 check: 2032
long lived tree of depth 6	 check: 127

real	0m0,072s
user	0m0,039s
sys	0m0,012s
rmannibucau@rmannibucau-yupiik:/tmp/test $ time ./app # java native mode
stretch tree of depth 7	 check: 255
64	 trees of depth 4	 check: 1984
16	 trees of depth 6	 check: 2032
long lived tree of depth 6	 check: 127

real	0m0,008s
user	0m0,000s
sys	0m0,009s

rmannibucau@rmannibucau-yupiik:/tmp/test $ time /tmp/test/bin/Release/net8.0/linux-x64/publish/test # dotnet native mode (aot)
stretch tree of depth 7	 check: 255
64	 trees of depth 4	 check: 1984
16	 trees of depth 6	 check: 2032
long lived tree of depth 6	 check: 127

real	0m0,006s
user	0m0,004s
sys	0m0,004s

What's important to note is that we can't conclude dotnet is faster than java in native mode, if you run 100 times dotnet will statically be slower but can be overall faster - this is why I think the soft is too short and dotnet has so much rate adjustment than without tuning for such a short live execution you get this instability.

Note: i did all the bench on ubuntu with 16 i9 and 64G of ram (indeed way too much for these apps ;)) and no particular tuning.

What is important for me is:

I guess the OS machine is key and should be highlighted on the html pages
the difference is likely not that huge so something can be fishy in the setup (until you tested it on windows where it can be from my experience)
aot benchmarking can be neat
getting 100 runs and statistics about it can be worth it
can be worth ensuring all mains can loop to have longer durations
can be interesting once 4. is done to get the error % on the min/max/mean duration (pr percentiles) in the report
maybe realign the codes to ensure they are comparable (cs vs java was not 1-1 for bintree and it had a light but noticeable impact locally)

That said I still want to also say a big thank you cause it is a lot of work and always a very source to get started when working on these topics.

Jul 05 '24 10:07 rmannibucau

It should be quite obvious that these benchmarks are to make certain languages look faster than others. The number of PR's and Issues untouched for 3 years proves that point. I was going to fix some code that I saw as blatantly slow but its clear to me that this is a marketing campaign vs a real desire to get the best out of each language.

Aug 21 '24 23:08 tebrownJHA

@rmannibucau Do you realize that the workload is expected to be set on the command-line as a program arg?

@hanabi1224 set N=18 on the command-line for binary-trees and the benchmarks game set N=21.

Sep 22 '24 19:09 igouy

@igouy doesnt change critically the output (didnt test monstereous number neither) but still think points are valid

Sep 22 '24 19:09 rmannibucau

You're reporting tiny tiny durations, 1000ths of a second, when the resolution of the time command may only be 1000 Hz.

bintree one where dotnet looks insanely slow

java # 7 is insanely fast. The other java and c# binary trees programs are similar too each other.

Sep 22 '24 23:09 igouy

Ok, seems it was updated since I ran it cause it was not these numbers at all (or I misused something). These numbers are ok for me.

Sep 23 '24 05:09 rmannibucau

Programming-Language-Benchmarks Programming-Language-Benchmarks copied to clipboard

Make benchmarks more precises and complete?

Programming-Language-Benchmarks
Programming-Language-Benchmarks copied to clipboard