prettyprinter
prettyprinter copied to clipboard
Optimize the order of constructors in Doc
Starting with GHC 8.10, handling the first 6 constructors of Doc
should be slightly faster than handling the other 7. See this GHC patch for the background on this.
By shuffling around the order of constructors, we might be able to speed up typical documents.
For example, Cat
should probably be in the first group, while Fail
and FlatAlt
could be in the second group without much performance degradation. Annotated
could also be a candidate for the first group.
A little helper function that reports the frequency of constructors would be useful here.
SimpleDocStream
fortunately has 7 constructors, which is exactly the number of constructor tags on 64-bit systems. 0 is the tag for unevaluated thunks.
A little helper function that reports the frequency of constructors would be useful here.
data Frequencies = Frequencies
{ _fail :: !Int
, _empty :: !Int
, _char :: !Int
, _text :: !Int
, _flatAlt :: !Int
, _line :: !Int
, _cat :: !Int
, _nest :: !Int
, _union :: !Int
, _column :: !Int
, _withPageWidth :: !Int
, _nesting :: !Int
, _annotated :: !Int
} deriving Show
emptyFrequencies :: Frequencies
emptyFrequencies = Frequencies 0 0 0 0 0 0 0 0 0 0 0 0 0
addFrequencies :: Frequencies -> Frequencies -> Frequencies
addFrequencies
(Frequencies a0 b0 c0 d0 e0 f0 g0 h0 i0 j0 k0 l0 m0)
(Frequencies a1 b1 c1 d1 e1 f1 g1 h1 i1 j1 k1 l1 m1) =
Frequencies (a0 + a1) (b0 + b1) (c0 + c1) (d0 + d1) (e0 + e1) (f0 + f1) (g0 + g1) (h0 + h1) (i0 +i1) (j0 +j1) (k0 + k1) (l0 + l1) (m0 + m1)
countFrequencies' :: Int -> PageWidth -> Int -> Doc ann -> Frequencies
countFrequencies' column_ pageWidth_ nesting_ x = case x of
Fail -> e { _fail = 1 }
Empty -> e { _empty = 1 }
Char{} -> e { _char = 1 }
Text{} -> e { _text = 1 }
FlatAlt a b -> e { _flatAlt = 1 } `add` count a `add` count b
Line -> e { _line = 1 }
Cat a b -> e { _cat = 1 } `add` count a `add` count b
Nest _ a -> e { _nest = 1 } `add` count a
Union a b -> e { _union = 1 } `add` count a `add` count b
Column f -> e { _column = 1 } `add` count (f column_)
WithPageWidth f -> e { _withPageWidth = 1 } `add` count (f pageWidth_)
Nesting f -> e { _nesting = 1 } `add` count (f nesting_)
Annotated _ a -> e { _annotated = 1 } `add` count a
where
e = emptyFrequencies
add = addFrequencies
count = countFrequencies' column_ pageWidth_ nesting_
countFrequencies :: Doc ann -> Frequencies
countFrequencies = countFrequencies' 10 defaultPageWidth 10
frequenciesToList :: Frequencies -> [(Int, Text)]
frequenciesToList f =
[ (_fail f, "Fail")
, (_empty f, "Empty")
, (_char f, "Char")
, (_text f, "Text")
, (_flatAlt f, "FlatAlt")
, (_line f, "Line")
, (_cat f, "Cat")
, (_nest f, "Nest")
, (_union f, "Union")
, (_column f, "Column")
, (_withPageWidth f, "WithPageWidth")
, (_nesting f, "Nesting")
, (_annotated f, "Annotated")
]
I have used countFrequencies
on some dhall
expressions (as usual):
λ> t <- Data.Text.IO.readFile "/home/simon/src/cpkg/pkgs/pkg-set.dhall"
λ> Right e = exprFromText "" t
λ> Data.List.sort $ frequenciesToList $ countFrequencies (prettyExpr e)
[
( 0
, "WithPageWidth"
)
,
( 5056
, "Fail"
)
,
( 2774984
, "Nest"
)
,
( 9265877
, "Column"
)
,
( 9265877
, "Nesting"
)
,
( 10152653
, "FlatAlt"
)
,
( 10152654
, "Union"
)
,
( 15691638
, "Line"
)
,
( 24664844
, "Empty"
)
,
( 86956579
, "Text"
)
,
( 195717122
, "Char"
)
,
( 199805788
, "Annotated"
)
,
( 302729931
, "Cat"
)
]
In some expressions Column
and Nesting
(introduced quite surely by align
) appear more frequently. This is Prelude.JSON.renderYAML
:
[
( 0
, "WithPageWidth"
)
,
( 124
, "Fail"
)
,
( 143695
, "FlatAlt"
)
,
( 143696
, "Union"
)
,
( 154122
, "Nest"
)
,
( 244368
, "Line"
)
,
( 288541
, "Column"
)
,
( 288541
, "Nesting"
)
,
( 836654
, "Empty"
)
,
( 1288859
, "Text"
)
,
( 2786542
, "Char"
)
,
( 3100284
, "Annotated"
)
,
( 4869155
, "Cat"
)
]
I do wonder whether counting constructor by traversing the entire tree is the right thing to do though. Maybe it would be better to hook into e.g. layoutWadlerLeijen
and count the constructors we pattern-match on there.
In https://github.com/quchen/prettyprinter/commit/e79bb67f5e05f9dc9d0e08335a2517ea392664b0 I have moved the Cat
and Annotated
constructors into the first constructor group. I haven't seen a significant performance difference from this change though. Better benchmarks (https://github.com/quchen/prettyprinter/issues/118) would be helpful!