language-bash
language-bash copied to clipboard
Performance
I use language-bash in ghcup to parse variable assignments from files like /etc/os-release
: https://gitlab.haskell.org/haskell/ghcup-hs/-/blob/master/lib/GHCup/Utils/Bash.hs
This works relatively well, but when I profile ghcup on the simplest command ghcup list
, I get this:
Thu Jun 25 21:56 2020 Time and Allocation Profiling Report (Final)
ghcup +RTS -p -RTS list
total time = 0.05 secs (52 ticks @ 1000 us, 1 processor)
total alloc = 56,666,312 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
word Language.Bash.Parse.Word src/Language/Bash/Parse/Word.hs:300:1-43 15.4 40.2
jstring Data.Aeson.Parser.Internal Data/Aeson/Parser/Internal.hs:316:1-32 11.5 5.5
try Control.Monad.Catch src/Control/Monad/Catch.hs:789:1-47 9.6 9.4
skipSpace Language.Bash.Parse.Word src/Language/Bash/Parse/Word.hs:(81,1)-(86,50) 5.8 1.5
<?> Data.Aeson.Types.Internal Data/Aeson/Types/Internal.hs:532:1-74 3.8 0.4
unfoldDirContents Streamly.External.Posix.DirStream src/Streamly/External/Posix/DirStream.hs:(50,1)-(59,69) 3.8 1.1
dirContentsStream Streamly.External.Posix.DirStream src/Streamly/External/Posix/DirStream.hs:(70,1)-(72,83) 3.8 2.0
lexeme Text.Megaparsec.Lexer Text/Megaparsec/Lexer.hs:78:1-23 3.8 3.2
pathParser' URI.ByteString.Internal src/URI/ByteString/Internal.hs:(567,1)-(569,82) 3.8 1.6
decimal Data.Attoparsec.ByteString.Char8 Data/Attoparsec/ByteString/Char8.hs:(447,1)-(448,49) 1.9 0.1
copy Data.HashMap.Array Data/HashMap/Array.hs:(328,1)-(333,30) 1.9 0.0
object_' Data.Aeson.Parser.Internal Data/Aeson/Parser/Internal.hs:(135,49)-(137,22) 1.9 0.5
catchAll Control.Monad.Catch src/Control/Monad/Catch.hs:744:1-16 1.9 1.5
mapExcepts Haskus.Utils.Variant.Excepts src/lib/Haskus/Utils/Variant/Excepts.hs:102:1-33 1.9 0.0
liftE Haskus.Utils.Variant.Excepts src/lib/Haskus/Utils/Variant/Excepts.hs:110:1-38 1.9 0.0
mapVariantAt Haskus.Utils.Variant src/lib/Haskus/Utils/Variant.hs:(413,1)-(416,47) 1.9 0.0
uncons Data.ByteString.Lazy.UTF8 Data/ByteString/Lazy/UTF8.hs:(228,1)-(229,38) 1.9 0.6
parseAbs HPath src/HPath.hs:(142,1)-(147,38) 1.9 0.0
word Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:175:1-66 1.9 0.1
ioDesc Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:188:1-46 1.9 0.3
assign Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:217:1-43 1.9 0.2
anyOperator Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:209:1-51 1.9 3.0
select Language.Bash.Operator src/Language/Bash/Operator.hs:19:1-43 1.9 1.8
runParsecT Text.Megaparsec.Internal Text/Megaparsec/Internal.hs:(591,1)-(596,56) 1.9 0.1
getOptic Optics.Internal.Optic src/Optics/Internal/Optic.hs:70:5-12 1.9 0.0
wrapCompile Text.Regex.Posix.Wrap src/Text/Regex/Posix/Wrap.hsc:(372,1)-(384,42) 1.9 0.0
regNameParser URI.ByteString.Internal src/URI/ByteString/Internal.hs:(533,1)-(535,59) 1.9 0.3
getDownloadsF GHCup.Download lib/GHCup/Download.hs:(105,1)-(127,42) 1.9 0.0
printListResult Main app/ghcup/Main.hs:(1332,1)-(1377,18) 1.9 0.2
satisfying Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:(130,1)-(132,49) 0.0 1.3
pack Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:(99,1)-(122,14) 0.0 2.2
anyWord Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:171:1-39 0.0 2.4
name Language.Bash.Parse.Word src/Language/Bash/Parse/Word.hs:(313,1)-(316,38) 0.0 1.8
functionName Language.Bash.Parse.Word src/Language/Bash/Parse/Word.hs:(320,1)-(324,31) 0.0 1.5
getDownloads GHCup.Download lib/GHCup/Download.hs:(143,1)-(261,42) 0.0 2.2
This is on a 7 LOC bash file:
NAME="Exherbo"
PRETTY_NAME="Exherbo Linux"
ID="exherbo"
ANSI_COLOR="0;32"
HOME_URL="https://www.exherbo.org/"
SUPPORT_URL="irc://irc.freenode.net/#exherbo"
BUG_REPORT_URL="https://bugs.exherbo.org/"
I'm wondering if there's a way to reduce the allocations.
Hmm, unfortunately I don't know too much about haskell performance, so I can't really say off the top of my head what's causing all the allocations. The parser certainly wasn't written with performance in mind.
If you know the files you're parsing are going to be in this nice list-of-variables format, maybe it would be better to use a custom parser?
I also notice you're parsing each file twice: https://github.com/haskell/ghcup-hs/blob/fafff9dadd60e7f85d1fdea00b324bae7b40164c/lib/GHCup/Platform.hs#L153
If you know the files you're parsing are going to be in this nice list-of-variables format, maybe it would be better to use a custom parser?
I started that already, but I don't think it's a good idea. There are so many assumptions when doing a naive parser and I won't have an easy way of knowing whether a valid user file was incorrectly parsed, because these are silent failures.
On the other hand it seems this is only a shell subset: https://www.freedesktop.org/software/systemd/man/os-release.html
So maybe feasible after all.
I also notice you're parsing each file twice: https://github.com/haskell/ghcup-hs/blob/fafff9dadd60e7f85d1fdea00b324bae7b40164c/lib/GHCup/Platform.hs#L153
good catch