bytestring icon indicating copy to clipboard operation
bytestring copied to clipboard

Builder literals are noticeably slower to compile than ByteString literals

Open mignon-p opened this issue 3 years ago • 3 comments

This is in relation to a recent discussion on the Haskell Discourse. There were two similar-but-different issues discussed in that thread, so I am filing two GitHub issues. (The other one is #495.)

This issue is that it's noticeably faster (at compile time) to use the IsString instance for ByteString and then call byteString on it, than it is to use the IsString instance for Builder directly. That seems surprising to me, and probably indicates an inefficiency in the IsString instance for Builder.

For example:

[2 of 3] Compiling Experiment.Medium               ( 1.53 s)
[3 of 3] Compiling Experiment.Slow                 ( 2.18 s)

Slow is using the IsString instance for Builder, while Medium is using the IsString instance for ByteString, and then converting it to a Builder.

The code that demonstrates this behavior is in my strlit-test repository.

I'm using GHC 8.10.7 and bytestring-0.10.12.0.

mignon-p avatar Mar 08 '22 02:03 mignon-p

Thanks for the great testcase. Thank you for preparing it. I looked briefly at this and found that the issue is largely limited to bytestring-0.10. Happily, the situation is quite a bit better with 0.11. This can be seen by comparing the compilation time of the testcase when compiled with GHC 8.10.7 and 9.2.1 (while ship bytestring 0.10 and 0.11, respectively):

  slow medium fast
8.10.7 2.6 1.8 0.5
9.0.2 3.3 2.5 1.3
9.2.1 1.6 2.3 1.5

There seem to be a few reasons for the differences here: In 8.10.7 and 9.0.2 the code generated in the slow case is truly awful, e.g.:

kOn3 = "ON"#
kOn2 = unpackCString# kOn3
kOn1
  = \ @ r_X1gu w_scNh w1_scNi w2_scNj ->
      case w1_scNi of { BufferRange ww1_scNm ww2_scNn ->
      $wstep kOn2 w_scNh ww1_scNm ww2_scNn w2_scNj
      }
kOn = kOn1 `cast` <Co:17>

That is, the builder unpacks the primitive string into a String, which then gets copied byte-by-byte into the output buffer. The fact that this went unnoticed for as long as it did surely is rather alarming as there is nothing particularly odd about slow; it is using Builder precisely as it was intended to be used.

bgamari avatar Mar 15 '22 22:03 bgamari

Thanks for looking into this, @bgamari!

It seems to me that there's nothing to fix in the latest bytestring, but that it would be good to add some regression tests:

  1. Builder literals should be efficient and not do byte-by-byte copying behavior of GHC 8.10 and 9.0. Can this be checked with inspection-testing in some way?! Otherwise, we can check the generated Core directly.

  2. Since the original issue was about compiler perf, it would be good to add compiler perf tests for Builder literals. These should be added to GHC's existing perf testsuite.

sjakobi avatar Apr 02 '22 20:04 sjakobi

IMHO inspection-testing is quite underused. It would be great if someone could write some tests for this case.

However, even if not it should be easy to detect this sort of regression in allocations numbers from benchmarks.

bgamari avatar Apr 08 '22 03:04 bgamari