scala-benchmarks icon indicating copy to clipboard operation
scala-benchmarks copied to clipboard

Add cross-compilation support for Scala-2.13.0-M5

Open dmit opened this issue 6 years ago • 9 comments

Since the first Scala 2.13 Release Candidate is coming soon™ and brings with it the reworked collections library, I thought it would be interesting to add 2.13 as a cross target for this project. 2.12 remains the default.

When running benchmarks compiled with Scala 2.13, the newly introduced s.c.i.LazyList will be used instead of the deprecated s.c.i.Stream. In order to keep the diff small and avoid code duplication, this is done by simply aliasing Stream to LazyList for 2.13 builds. This means that the corresponding benchmarks will still be named "stream*".

If there is interest, I was also thinking about adding benchmarks for the new immutable array wrapper s.c.i.ArraySeq as well as cats.data.Chain, which advertises O(1) concat, O(1) append, and amortized O(1) uncons.

dmit avatar Jan 24 '19 10:01 dmit

Thank you! I will get this in as soon as I can.

fosskers avatar Jan 24 '19 16:01 fosskers

How does one invoke compilation with one or the other, again? I'd need to be able to generate the benchmark numbers for either version with minimal hassle.

fosskers avatar Jan 24 '19 16:01 fosskers

Prepending a + to an sbt command performs it for all targets: sbt +clean +compile.

Double plus specifies a single Scala version to use: sbt "++2.13.0-M5 compile".

Since 2.12 is the default, there is no need to do anything extra if that is the version needed.

dmit avatar Jan 24 '19 16:01 dmit

Added instructions on how to run these benchmarks on different versions to the README.

Also, I don't have access to hardware where I can run the whole benchmark suite without external interference, but here are the StreamBench results for 2.12.8 and 2.13.0-M5 on my i7-6700k desktop:

2.12.8:

Benchmark List IList Vector Array Stream EStream Iterator
Head 145.561 213.727 124.877 176.047 0.049 0.165 0.021
Max 171.946 249.249 227.556 202.556 763.730 1406.926 130.229
Reverse 32.279 29.436 146.453 35.448 321.679 312.288
Sort 205.785 366.869 251.272 252.490 1110.265

2.13.0-M5:

Benchmark List IList Vector Array LazyList EStream Iterator
Head 157.581 213.973 111.111 100.743 0.204 0.155 0.020
Max 173.365 250.070 129.923 139.667 1308.481 1388.734 123.437
Reverse 28.474 28.921 81.615 33.027 209.873 318.322
Sort 210.450 347.894 153.390 126.459 1440.950

Looks like Vector got quite a bit faster, LazyList is slower than Stream that it's replacing (although the semantics are different), Array somehow got faster (?), and the rest are mostly the same as before.

dmit avatar Jan 24 '19 17:01 dmit

Thanks! I'll run these on my own machine too, and see what we see. It's good to see that Vector is faster - I've been quite against that data structure in general (i.e. I wasn't convinced it had a use-case).

fosskers avatar Jan 24 '19 17:01 fosskers

I think the main case for Vector in Scala <=2.12 was random access, for sufficiently large numbers of N. Especially if the vector was shared among execution actors, where the immutable nature of the data structure was paramount.

Looks like in 2.13 that use case is even more viable.

dmit avatar Jan 24 '19 18:01 dmit

Just as much could be accomplished with Array (with an immutable wrapper, if one wanted), couldn't it?

fosskers avatar Jan 24 '19 20:01 fosskers

Absolutely, if access is read-only. But if each thread/execution context needs to make minor changes to its copy of the collection, then arrays quickly get too expensive memory-wise. Speaking of which, don't forget that List also takes about twice as much memory as a Vector.

That's my intuition for Vector's usefulness - shared collections that have thousands of elements or more, and allow localized changes without copying the whole thing. Of course now that I've written out all those caveats, it seems that the actual real-life use cases for Vector are pretty rare. When you get far enough to worry about performance characteristics of Vector, you probably just want to implement a custom collection type that best meets your needs (and is probably based on Array).

I think we both agree that if RAM usage is not a concern, contiguous chunks of memory + memcpy is a great approach on modern hardware.

dmit avatar Jan 25 '19 22:01 dmit

I updated a few things on master. I'll hijack this PR and get it merged. I also need to figure out a nice layout for posting the various results for each Scala version in a way that's easy to visually parse.

fosskers avatar Apr 30 '19 04:04 fosskers