haddock icon indicating copy to clipboard operation
haddock copied to clipboard

Using `aeson` for JSON

Open parsonsmatt opened this issue 2 years ago • 3 comments

Fast JSON

As usual, testing on persistent.

With --quickjump:

--quickjump is what actually uses the JSON stuff, so let's try running with that.

Baseline

time cabal haddock --haddock-options "-v2 --quickjump +RTS -s -RTS --optghc=\"-v2\" --lib=/home/matt/Projects/haddock/haddock-api/resources" persistent --with-haddock=/home/matt/.cabal/bin/haddock-943
!!! renameAllInterfaces: finished in 112.10 milliseconds, allocated 232.280 megabytes
*** ppJsonIndex:
!!! ppJsonIndex: finished in 1568.96 milliseconds, allocated 7682.663 megabytes
*** ppHtml:
*** ppHtmlContents:
!!! ppHtmlContents: finished in 0.73 milliseconds, allocated 2.298 megabytes
*** ppJsonIndex:
!!! ppJsonIndex: finished in 26.43 milliseconds, allocated 129.281 megabytes

  24,248,235,776 bytes allocated in the heap
   3,644,045,200 bytes copied during GC
     379,247,624 bytes maximum residency (16 sample(s))
       5,837,816 bytes maximum slop
             968 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5794 colls,     0 par    1.573s   1.575s     0.0003s    0.0028s
  Gen  1        16 colls,     0 par    1.255s   1.498s     0.0936s    0.3925s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    5.647s  (  6.241s elapsed)
  GC      time    2.828s  (  3.074s elapsed)
  EXIT    time    0.001s  (  0.005s elapsed)
  Total   time    8.476s  (  9.320s elapsed)

  Alloc rate    4,294,171,488 bytes per MUT second

  Productivity  66.6% of total user, 67.0% of total elapsed

We generate the JSON index twice - which seems weird? The two calls are very different. Feels like it should be folded in to a single thing.

  • JSON Index Time: 1568.96ms
  • JSON Index Alloc: 7682.663 MB
  • Total Memory: 968 MB
  • Total Allocations: 24,248 MB
  • Total Time: 8.476 s

With aeson and toEncoding

!!! renameAllInterfaces: finished in 108.26 milliseconds, allocated 232.280 megabytes
*** ppJsonIndex:
!!! ppJsonIndex: finished in 365.63 milliseconds, allocated 811.310 megabytes
*** ppHtml:
*** ppHtmlContents:
!!! ppHtmlContents: finished in 1.07 milliseconds, allocated 2.298 megabytes
*** ppJsonIndex:
!!! ppJsonIndex: finished in 21.48 milliseconds, allocated 109.262 megabytes

  17,022,183,480 bytes allocated in the heap
   2,663,379,568 bytes copied during GC
     248,824,560 bytes maximum residency (15 sample(s))
       5,737,744 bytes maximum slop
             586 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4043 colls,     0 par    1.313s   1.319s     0.0003s    0.0028s
  Gen  1        15 colls,     0 par    0.877s   0.880s     0.0587s    0.1663s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time    5.090s  (  5.627s elapsed)
  GC      time    2.190s  (  2.199s elapsed)
  EXIT    time    0.001s  (  0.003s elapsed)
  Total   time    7.282s  (  7.830s elapsed)

  Alloc rate    3,343,944,691 bytes per MUT second

  Productivity  69.9% of total user, 71.9% of total elapsed
  • JSON Index Time: 365.63ms
  • JSON Index Allocations: 811.310 MB
  • Total Memory: 586 MB
  • Total Allocations: 17,022 MB
  • Total Time: 7.282s

Comparison

Baseline Patched Difference Improvement
JSON Index Time 1568.96 365.63 1203.33 76.7%
JSON Index Allocations 7682.663 811.31 6871.352999999999 89.4%
Total Allocations 24248.0 17022.0 7226.0 29.8%
Max Residency 379.0 248.0 131.0 34.6%
Total Memory 968.0 586.0 382.0 39.5%
Total Time 8.467 7.282 1.185 14%

Overall a pretty large improvement.

Unfortunately, using aeson directly is likely out of the question - there are far too many dependencies. Even if we were to core out the important bits, that would require unordered-containers, vector, and attoparsec, none of which are boot packages.

Still, I think this PR demonstrates that a faster JSON representation is worth pursuing.

parsonsmatt avatar Dec 27 '22 21:12 parsonsmatt

Thanks for taking the time @parsonsmatt, this is really appreciated. :)

Kleidukos avatar Dec 27 '22 22:12 Kleidukos

RE aeson dependency footprint : to fight that, there once was https://github.com/haskell-hvr/microaeson I wonder how it compares performance-wise...

ulysses4ever avatar Dec 27 '22 23:12 ulysses4ever

I'm not sure - #1559 uses Text and Map and doesn't get close. I suspect the machinery in toEncoding and the other Value -> Builder is the secret sauce.

I did actually try the toEncoding . toJSON, and it was slightly slower, but only ~10% worse than the toEncoding variant.

The implementation of toEncoding on a [a] to pretty simple -

list :: (a -> Encoding) -> [a] -> Encoding
list _  []     = emptyArray_
list to' (x:xs) = openBracket >< to' x >< commas xs >< closeBracket
  where
    commas = foldr (\v vs -> comma >< to' v >< vs) empty
{-# INLINE list #-}

>< is <> but pushed into Encoding.

parsonsmatt avatar Dec 27 '22 23:12 parsonsmatt

Hi, thank you for this PR, but Haddock now lives full-time in the GHC repository! Read more at https://discourse.haskell.org/t/haddock-now-lives-in-the-ghc-repository/9576.

Let me know if you feel it is still needed, and I'll migrate it. :)

Kleidukos avatar May 18 '24 12:05 Kleidukos