dotty-feature-requests
dotty-feature-requests copied to clipboard
Benchmark more things
Here are some suggestions of things we could add to our benchmarks:
- [ ] Benchmark against scalac
- [ ] Benchmark against a non-bootstrapped dotty
- [x] Benchmark against an optimized dotty
- [x] Benchmark more code, like all the projects in the https://github.com/lampepfl/dotty-community-build
- [ ] Benchmark re-using the same compiler instance like in https://github.com/scala/compiler-benchmark/pull/39
- [x] Compare the results we get from https://github.com/liufengyun/bench and https://github.com/scala/compiler-benchmark to get more confidence that we're benchmarking the correct thing.
Thanks for making the list @smarter . I see some items are actionable, but other need more thinking about how to accommodate them without breaking the maintainability of the bench infrastructure.
The new bench project adopts a data-centered design, where the whole system is designed around the CSV file. This design enables us to easily develop features like, PR trackability, open PR testing, as well as make the web UI project-agnostic. That's the reason why I didn't use the scala/compiler-benchmark
project, which is not easy to customise to support the features we want.
The experience with the previous benchmark infrastructure made me realise that maintainability is a major issue with compiler bench infrastructure. Thus,while I'm open to new features, I'd argue that any new feature should be developed based on the same design philosophy without sacrificing maintainability.
Some detailed feedback regarding the items:
Benchmark against scalac
What does this exactly mean? Which version to benchmark? Do we show a line for it? What to do with test cases that don't compile with scalac?
Benchmark against a non-bootstrapped dotty; Benchmark against an optimized dotty
These two require changes to the CSV file to allow multiple lines in a chart. A potential problem here is the misalignment of points. I need to investigate more to evaluate the scope of changes required and its implication in maintainability.
Benchmark all the projects in the community build
A concern here is that the community build breaks from time to time, it we add them to the benchmarks, we will have to disable the community build in bench from time to time. But if that's acceptable, then it's not a problem.
Benchmark re-using the same compiler instance
Could you please detail what does this mean for Dotty? Does it mean reuse of the context?
Compare the results from liufengyun/bench and scala/compiler-benchmark to get more confidence that we're benchmarking the correct thing
Maybe I misunderstood, as we are testing Dotty, while the other project is testing Scalac, and they are on different machines, it's not easy to draw meaningful conclusion from the comparison.
What does this exactly mean? Which version to benchmark? Do we show a line for it? What to do with test cases that don't compile with scalac?
Good questions! :) I don't have any strong opinion here. I think it'd be interesting to just run Scala 2.12.3 once on whatever test cases we can get it to work on and display that. This way we have some idea of how much better or worse we are.
A concern here is that the community build breaks from time to time, it we add them to the benchmarks, we will have to disable the community build in bench from time to time. But if that's acceptable, then it's not a problem.
I think that's OK.
Benchmark re-using the same compiler instance Could you please detail what does this mean for Dotty? Does it mean reuse of the context?
Yes, re-using the same root context like we do in the REPL and the IDE.
Maybe I misunderstood, as we are testing Dotty, while the other project is testing Scalac, and they are on different machines, it's not easy to draw meaningful conclusion from the comparison.
The other project now also supports dotty: https://github.com/scala/compiler-benchmark/pull/31 . And apparently dotty does pretty badly there (Jason said "scalac compile times are is 0.65x that of 0.3.0-RC1"), so it's worth seeing if we can reproduce their results.
I think it'd be interesting to benchmark compiling re2s too. It's the project used for benchmarking rsc: https://github.com/twitter/reasonable-scala/tree/performance I have a branch of re2s that compiles with Dotty at https://github.com/smarter/reasonable-scala/commits/dotty-re2s (code is in https://github.com/smarter/reasonable-scala/tree/dotty-re2s/examples/re2s)
Now we have benchmarks for optimised dotty. In the time
mode, the related lines(optimised, bootstrapped) are shown in the same chart.
There's a switch in the sidebar of the UI to switch between commit
and time
mode.
http://dotty-bench.epfl.ch/