ghc-events
ghc-events copied to clipboard
getEvent can't deserialize event serialized by putEvent
ghc-events currently fails to deserialize some events that it serializes. To reproduce, first generate a minimal eventlog file from the program main = return ()
(tried GHC 8.4), then run
import Data.Binary.Get
import Data.Binary.Put
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LBS
import GHC.RTS.Events
import GHC.RTS.Events.Binary
import GHC.RTS.Events.Incremental
main = do
bs <- LBS.readFile "empty.eventlog"
let header = runGet getHeader bs
let parsers = mkEventParsers header
let event = Event
{ evTime = 415731793
, evSpec = GCStatsGHC
{ heapCapset = 0
, gen = 1
, copied = 305104
, slop = 250088
, frag = 3088384
, parNThreads = 12
, parMaxCopied = 4246
, parTotCopied = 305104
}
, evCap = Nothing
}
print (runGet (getEvent parsers) (runPut (putEvent event)))
Output:
bug: Data.Binary.Get.runGet at position 60: not enough bytes
CallStack (from HasCallStack):
error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.5.1:Data.Binary.Get
What does mkEventParsers
do?
Given a header it returns event parsers (mkEventParsers :: Header -> EventParsers
). See the definition here: https://github.com/osa1/ghc-events/blob/5090f824fa709415336248342674fdd4862695da/src/GHC/RTS/Events/Incremental.hs#L161 basically just the event parser builder from mkEventDecoder
factored out for reuse.
I added a bunch of debug prints and it seems like the parser for EVENT_GC_STATS_GHC event can actually parse this and the parsed evSpec is identical to the original one, but somehow getEvent
needs one more word before returning. If I add one more word to the bytestring
let bs = runPut (putEvent event)
print (runGet (getEvent parsers) (bs <> "00000000"))
it works...
OK so this is because of padParser
which for some reason makes this parser expect one more word after the actual contents. Investigating more ...
So according to the event type parser event type of EVENT_GC_STATS_GHC is
EventType {num = 53, desc = "GC statistics", size = Just 58}
this says the size is 58 but actually it's declared as 50 in GHC.RTS.Events.Binary
and putEvent
also puts 50 bytes for the event spec (60 when you add the timestamp and event type).
So assuming event type parser is not buggy
-
ghc-events
thinks size ofEVENT_GC_STATS_GHC
is 50 - GHC thinks its size is 58
Not sure why there's a mismatch. Perhaps there's one more field to EVENT_GC_STATS
that we currently don't parse?
Right, so GHC generates one more word for EVENT_GC_STATS_GHC
which is for "par_balanced_copied": https://github.com/ghc/ghc/blob/65eec9cfd4410c0e30b0ed06116c15f8ce3de49d/rts/eventlog/EventLog.c#L413-L424
Not sure how to deal with this ...
Quoting from the diff that added this new field: https://phabricator.haskell.org/D3658
Add the new value at the end, and we'll gradually deprecate par_max_copied. Old clients that expect the old format will understand everything except the new field, and will mostly work, except that they might miscalculate work balance.
So they relied on the fact that ghc-events (or other GHC event parsers) can deal with events larger than the parser for the event expects.