Using `lucid` for HTML generation
This PR uses lucid for HTML generation, mostly as a test.
Baseline
The baseline is the ghc-9.4 branch , not my xhtml builder patch.
!!! ppHtml: finished in 229.42 milliseconds, allocated 1087.125 megabytes
16,064,701,512 bytes allocated in the heap
2,265,714,656 bytes copied during GC
191,524,344 bytes maximum residency (14 sample(s))
5,218,824 bytes maximum slop
493 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3820 colls, 0 par 1.220s 1.222s 0.0003s 0.0035s
Gen 1 14 colls, 0 par 0.703s 0.704s 0.0503s 0.1284s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.000s elapsed)
MUT time 4.838s ( 5.365s elapsed)
GC time 1.923s ( 1.925s elapsed)
EXIT time 0.001s ( 0.009s elapsed)
Total time 6.762s ( 7.300s elapsed)
Alloc rate 3,320,501,272 bytes per MUT second
Productivity 71.5% of total user, 73.5% of total elapsed
Lucid
!!! ppHtml: finished in 445.94 milliseconds, allocated 2262.033 megabytes
17,296,688,776 bytes allocated in the heap
2,251,719,184 bytes copied during GC
191,537,256 bytes maximum residency (14 sample(s))
5,267,352 bytes maximum slop
493 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3969 colls, 0 par 1.232s 1.234s 0.0003s 0.0034s
Gen 1 14 colls, 0 par 0.711s 0.711s 0.0508s 0.1275s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.000s elapsed)
MUT time 5.068s ( 5.588s elapsed)
GC time 1.943s ( 1.945s elapsed)
EXIT time 0.001s ( 0.007s elapsed)
Total time 7.012s ( 7.540s elapsed)
Alloc rate 3,412,872,044 bytes per MUT second
Productivity 72.3% of total user, 74.1% of total elapsed
Comparison
| Baseline | Patched | Difference | Improvement | |
|---|---|---|---|---|
| HTML Time | 229.42 | 445.94 | -216.52 | -94% |
| HTML Allocations | 1087.125 | 2262.033 | -1174.908 | -108% |
| Total Allocations | 16064.0 | 17296.0 | -1232.0 | -7% |
| Max Residency | 191.0 | 191.0 | 0.0 | 0.0% |
| Total Memory | 493.0 | 493.0 | 0.0 | 0.0% |
| Total Time | 6.762 | 7.012 | -0.25 | -3.7% |
Very surprising! lucid is twice as slow.
Now, I suspect that the isNoHtml thing is part of that. With the Lucid representation, we're actually having to run the whole Seq Attribute -> Builder, then converting that Builder to a bytestring, and finally checking if it is null. This is very fast with xhtml - the isNoHtml check is a null on a list.
Banning isNoHtml
Without isNoHtml to ruin the party, we get this run:
!!! ppHtml: finished in 170.42 milliseconds, allocated 592.030 megabytes
15,545,553,496 bytes allocated in the heap
2,222,885,336 bytes copied during GC
191,520,720 bytes maximum residency (14 sample(s))
5,242,928 bytes maximum slop
493 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 3694 colls, 0 par 1.198s 1.199s 0.0003s 0.0034s
Gen 1 14 colls, 0 par 0.715s 0.715s 0.0510s 0.1266s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.000s elapsed)
MUT time 4.819s ( 5.332s elapsed)
GC time 1.912s ( 1.914s elapsed)
EXIT time 0.001s ( 0.004s elapsed)
Total time 6.733s ( 7.250s elapsed)
Alloc rate 3,225,735,117 bytes per MUT second
Productivity 71.6% of total user, 73.5% of total elapsed
| Baseline | Patched | Difference | Improvement | |
|---|---|---|---|---|
| HTML Time | 229.42 | 170.42 | 59.0 | 26% |
| HTML Allocations | 1087.125 | 592.03 | 495.095 | 46% |
| Total Allocations | 16064.0 | 15545.0 | 519.0 | 3% |
| Max Residency | 191.0 | 191.0 | 0.0 | 0.0% |
| Total Memory | 493.0 | 493.0 | 0.0 | 0.0% |
| Total Time | 6.762 | 6.733 | 2.8999e-2 | 0.429% |
That's a pretty great improvement!
I'm surprised to see that total allocations, max residency, and total memory are unchaged from baseline, despite 46% fewer allocations. I guess the streaming [Char] representation has some advantages.
It might be difficult to “drop in” lucid over xhtml onto a codebase that’s using strings everywhere and see an instant performance improvement due to the use of T.pack, both explicit and the implicit conversions in toHtml, as these will be creating lots of copies which means allocations.
I’m not sure that the GHC API gives you a FastString for many things in Haddock, because from there you could possibly jump straight to a Builder and avoid packing and re-encoding. But it might be possible to reduce pervasiveness of strings used internally within Haddock.
Another thing to check would be: are you writing the resulting HTML to eg a file directly via the builder, or are you converting to another intermediate string type?
The resulting HTML Builder is written directly to disk. I did try to ensure that part was optimal. I do think that the use of String everywhere is a cause for performance concern - pushing Text as far back into the code as possible helped a bit, but the source parsers for everything are extremely slow and work over String, so making it really fast is going to require more horizontal perfromance improvements rather than optimizing a single stage.