haddock
haddock copied to clipboard
Using Lazy Text
I was investigating the weird perf issues with the Builder stuff, and noticed that GHC 9.4.2 and GHC 9.4.3 have very different performance characteristics. With GHC 9.4.2, Builder performs worse. But with GHC 9.4.3, it's significantly better.
After thorough investigation, I figured out that part of the issue was materializing all these [Char]. Particularly, URLs seemed to be trouble - the xhtml interface only allowed you to use String for them, which isn't good. I patched xhtml to use LText for the Attr, which has a nice balance between performance for concatenation and inspection. This rippled through the code, pushing a ton of String into Text.
Additionally, I noticed that we were parsing things using Parsec as Text, but then unpacking them into [Char]. So I pushed the Text through the codebase.
This resulted in a pretty nice improvement in peformance. My baseline, using ghc-9.4 branch:
!!! ppHtml: finished in 225.55 milliseconds, allocated 1087.121 megabytes
haddock ghc-9.4 branch
16,056,038,816 bytes allocated in the heap
2,241,066,456 bytes copied during GC
193,369,736 bytes maximum residency (14 sample(s))
5,134,712 bytes maximum slop
507 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3818 colls, 0 par 1.234s 1.237s 0.0003s 0.0032s
Gen 1 14 colls, 0 par 0.658s 0.658s 0.0470s 0.1228s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.000s elapsed)
MUT time 4.891s ( 5.423s elapsed)
GC time 1.892s ( 1.896s elapsed)
EXIT time 0.001s ( 0.002s elapsed)
Total time 6.785s ( 7.320s elapsed)
Alloc rate 3,282,509,711 bytes per MUT second
Productivity 72.1% of total user, 74.1% of total elapsed
Documentation created:
/home/matt/Projects/persistent/dist-newstyle/build/x86_64-linux/ghc-9.4.3/persistent-2.14.4.3/doc/html/persistent/index.html
This branch (which does include the #1552 code, too) has these numbers:
!!! ppHtml: finished in 172.81 milliseconds, allocated 888.132 megabytes
16,042,480,936 bytes allocated in the heap
2,227,119,928 bytes copied during GC
190,551,728 bytes maximum residency (14 sample(s))
5,122,384 bytes maximum slop
482 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3745 colls, 0 par 1.222s 1.224s 0.0003s 0.0031s
Gen 1 14 colls, 0 par 0.703s 0.703s 0.0502s 0.1366s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.001s elapsed)
MUT time 4.974s ( 5.507s elapsed)
GC time 1.925s ( 1.927s elapsed)
EXIT time 0.001s ( 0.006s elapsed)
Total time 6.901s ( 7.440s elapsed)
Alloc rate 3,225,022,895 bytes per MUT second
Productivity 72.1% of total user, 74.0% of total elapsed
| Baseline | Lazy Text | Difference | Improvement | |
|---|---|---|---|---|
| HTML time | 225.55 ms | 172.81 ms | 52 ms | 23% |
| HTML allocation | 1087.121 MB | 888.132 MB | 198.99 MB | 18.3% |
| Max residency | 193,369,736 B | 190,551,728 | 2.69 MB | 1.5% |
| Total Memory | 507 MB | 482 MB | 25 MB | 4.93% |
| Allocations | 16,056,038,816 B | 16,042,480,936 B | ~12 MB | 0.08% |
| Time | 6.785s | 6.901s | -0.116s | -1.71% |
23% faster and 18.3% less memory while generating HTML, though the rest of the code is a tiny bit slower.
There's a long way to go towards making HTML generation more efficient. Timing data shows that we allocate roughly 200 times as much memory as the final HTML file weighs - so a ~1.1MB file on disk allocates ~222MB.
@parsonsmatt Great PR, thanks a lot! I have an adajcent question: Have you considered using OsPath instead of FilePath?
Am I misunderstanding the numbers here? It seems like the patch makes things slower overall.
9.4.2 had a bug in eta-expansion which was fixed in 9.4.3 (which affected the Builder performance).
It is also impossible to review this patch with the current commit history, can you please clean that up if you're going to pursue this.
@Kleidukos I'm not aware of OsPath and a Stackage search doesn't bring anything up.
@mpickering This PR is based on two other PRs, and when those are merged, this will be much easier to review. While the overall runtime is slightly slower on this case, the benefit is huge for packages that generate large documentation pages - project-specific Prelude-like modules in particular. Once I get the work app using 9.4.3 or 9.4.4 then I can get a more up-to-date perf numbers.
@Kleidukos I'm not aware of
OsPathand a Stackage search doesn't bring anything up.
Hoogle is more reliable then, it's a type from https://flora.pm/packages/@haskell/filepath.