Builder literals are noticeably slower to compile than ByteString literals
This is in relation to a recent discussion on the Haskell Discourse. There were two similar-but-different issues discussed in that thread, so I am filing two GitHub issues. (The other one is #495.)
This issue is that it's noticeably faster (at compile time) to use the IsString instance for ByteString and then call byteString on it, than it is to use the IsString instance for Builder directly. That seems surprising to me, and probably indicates an inefficiency in the IsString instance for Builder.
For example:
[2 of 3] Compiling Experiment.Medium ( 1.53 s)
[3 of 3] Compiling Experiment.Slow ( 2.18 s)
Slow is using the IsString instance for Builder, while Medium is using the IsString instance for ByteString, and then converting it to a Builder.
The code that demonstrates this behavior is in my strlit-test repository.
I'm using GHC 8.10.7 and bytestring-0.10.12.0.
Thanks for the great testcase. Thank you for preparing it. I looked briefly at this and found that the issue is largely limited to bytestring-0.10. Happily, the situation is quite a bit better with 0.11. This can be seen by comparing the compilation time of the testcase when compiled with GHC 8.10.7 and 9.2.1 (while ship bytestring 0.10 and 0.11, respectively):
| slow | medium | fast | |
|---|---|---|---|
| 8.10.7 | 2.6 | 1.8 | 0.5 |
| 9.0.2 | 3.3 | 2.5 | 1.3 |
| 9.2.1 | 1.6 | 2.3 | 1.5 |
There seem to be a few reasons for the differences here: In 8.10.7 and 9.0.2 the code generated in the slow case is truly awful, e.g.:
kOn3 = "ON"#
kOn2 = unpackCString# kOn3
kOn1
= \ @ r_X1gu w_scNh w1_scNi w2_scNj ->
case w1_scNi of { BufferRange ww1_scNm ww2_scNn ->
$wstep kOn2 w_scNh ww1_scNm ww2_scNn w2_scNj
}
kOn = kOn1 `cast` <Co:17>
That is, the builder unpacks the primitive string into a String, which then gets copied byte-by-byte into the output buffer. The fact that this went unnoticed for as long as it did surely is rather alarming as there is nothing particularly odd about slow; it is using Builder precisely as it was intended to be used.
Thanks for looking into this, @bgamari!
It seems to me that there's nothing to fix in the latest bytestring, but that it would be good to add some regression tests:
-
Builder literals should be efficient and not do byte-by-byte copying behavior of GHC 8.10 and 9.0. Can this be checked with
inspection-testingin some way?! Otherwise, we can check the generated Core directly. -
Since the original issue was about compiler perf, it would be good to add compiler perf tests for
Builderliterals. These should be added to GHC's existing perf testsuite.
IMHO inspection-testing is quite underused. It would be great if someone could write some tests for this case.
However, even if not it should be easy to detect this sort of regression in allocations numbers from benchmarks.