Java version using fork/join
This is a Java implementation using the Fork/Join functionality of 1.7+ (this particular version requires 1.8 or more).
This runs the test ten times to see the effect of a cold vs. warm JVM. On my machine it goes from 250ms to 50ms per run. (By comparison the .NET Core version takes 500ms.) If I run it 1000s of times it gets down to the 10ms to 15ms range.
Updated to include some minor improvements.
Running the test 25 times I get:
Result: 499999500000
Took: 175.90ms
Result: 499999500000
Took: 118.05ms
Result: 499999500000
Took: 137.63ms
Result: 499999500000
Took: 139.81ms
Result: 499999500000
Took: 67.08ms
Result: 499999500000
Took: 22.92ms
Result: 499999500000
Took: 91.30ms
Result: 499999500000
Took: 25.64ms
Result: 499999500000
Took: 115.90ms
Result: 499999500000
Took: 27.16ms
Result: 499999500000
Took: 19.58ms
Result: 499999500000
Took: 22.93ms
Result: 499999500000
Took: 22.73ms
Result: 499999500000
Took: 18.59ms
Result: 499999500000
Took: 25.52ms
Result: 499999500000
Took: 25.53ms
Result: 499999500000
Took: 23.56ms
Result: 499999500000
Took: 13.28ms
Result: 499999500000
Took: 9.40ms
Result: 499999500000
Took: 10.39ms
Result: 499999500000
Took: 10.90ms
Result: 499999500000
Took: 16.43ms
Result: 499999500000
Took: 9.99ms
Result: 499999500000
Took: 11.90ms
Result: 499999500000
Took: 12.35ms
With all cores at 100%
As a comparison, I modified the .NET Core version to run 25 times also, and got the following:
499999500000
Async sec: 0.526
499999500000
Async sec: 0.393
499999500000
Async sec: 0.395
499999500000
Async sec: 0.386
499999500000
Async sec: 0.380
499999500000
Async sec: 0.382
499999500000
Async sec: 0.389
499999500000
Async sec: 0.394
499999500000
Async sec: 0.384
499999500000
Async sec: 0.397
499999500000
Async sec: 0.386
499999500000
Async sec: 0.393
499999500000
Async sec: 0.393
499999500000
Async sec: 0.385
499999500000
Async sec: 0.385
499999500000
Async sec: 0.387
499999500000
Async sec: 0.444
499999500000
Async sec: 0.402
499999500000
Async sec: 0.401
499999500000
Async sec: 0.387
499999500000
Async sec: 0.388
499999500000
Async sec: 0.396
499999500000
Async sec: 0.409
499999500000
Async sec: 0.389
499999500000
Async sec: 0.386
With all eight cores average 75%.
On the first run out of 25, the Java version appears to be approximately 3x faster. At the last iteration, the Java version appears to be nearly 40x faster.
Well fork-join impl blew away go, scala and java benchmarks on my machine as well.