text icon indicating copy to clipboard operation
text copied to clipboard

Compile time increases related to changes between `text-1.2.5.0` and `text-2.0.1`

Open sjakobi opened this issue 1 year ago • 2 comments

The compile time for texmath-0.12.5.2 with GHC 9.2.4 roughly doubles from 68s to 141s when I use text-2.0.1 vs. text-1.2.5.0. A module that is particularly affected is Text.TeXMath.Readers.MathML.EntityMap which has a lot of overloaded Text literals. For this module GHC reports a large increase of the Core size with -dshow-passes:

With text-1.2.5.0:

Result size of CorePrep
  = {terms: 35,732, types: 33,586, coercions: 0, joins: 0/7}

With text-2.0.1:

Result size of CorePrep
  = {terms: 515,697,
     types: 277,992,
     coercions: 0,
     joins: 4,163/4,182}

(CCing the texmath maintainer, @jgm, and @mpickering who is both an author of texmath and knows a lot about GHC performance)

sjakobi avatar Aug 11 '22 12:08 sjakobi

I think the emojis package might have the same issue.

mpickering avatar Aug 11 '22 12:08 mpickering

I think the emojis package might have the same issue.

Indeed. On my machine the compile time increased from 12s to 90s for emojis-0.1.2.

sjakobi avatar Aug 11 '22 12:08 sjakobi

It seems that the overloaded literals are defined in a streaming fashion like this:

instance IsString Text where
    fromString = pack

pack :: String -> Text
pack = unstream . S.map safe . S.streamList

I wonder if something like this might fare better:

instance IsString Text where
    {-# INLINE fromString #-}
    fromString = textFromLit

-- Prevent it from inlining right away so rules can match.
{-# INLINEABLE[2] textFromLit #-}
textFromLit :: String -> Text
textFromLit s = T.pack s

{-# RULES
"fromCLit" forall s. textFromLit (E.unpackCString# s) = T.unpackCString #s
#-}

Which would compile to the final version very quickly.

AndreasPK avatar Aug 12 '22 15:08 AndreasPK

Of course this doesn't cover lazy Text and utf8 containing literals which need their own rules I suppose. But if there is no intention of supporting streaming over literals this seems like the right approach.

AndreasPK avatar Aug 12 '22 15:08 AndreasPK

I guess the best solution for emojis and texmath is to pass -O0, they are unlikely to gain anything from -O1. Even better option would be to evaluate emojis in compile-time using TH (which implies -O0) and then Lift back.

Bodigrim avatar Aug 20 '22 00:08 Bodigrim

For texmath I suspect -O1 makes an appreciable difference to the performance of the parsers. I guess what you mean is that we could put {-# OPTIONS_GHC -O0 #-} on just the modules that contain lots of string literals? I'd be fine with that, but it seems odd to have to resort to such a workaround. I would hope that text could eventually be changed so that this isn't needed.

jgm avatar Aug 22 '22 03:08 jgm

I would be careful enabling -O0 just for specific modules as it will affect how other modules are optimised due to #20056

mpickering avatar Aug 22 '22 17:08 mpickering