ghc-events icon indicating copy to clipboard operation
ghc-events copied to clipboard

getEvent can't deserialize event serialized by putEvent

Open osa1 opened this issue 5 years ago • 7 comments

ghc-events currently fails to deserialize some events that it serializes. To reproduce, first generate a minimal eventlog file from the program main = return () (tried GHC 8.4), then run

import Data.Binary.Get
import Data.Binary.Put
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LBS
import GHC.RTS.Events
import GHC.RTS.Events.Binary
import GHC.RTS.Events.Incremental

main = do
    bs <- LBS.readFile "empty.eventlog"
    let header = runGet getHeader bs
    let parsers = mkEventParsers header

    let event = Event
          { evTime = 415731793
          , evSpec = GCStatsGHC
              { heapCapset = 0
              , gen = 1
              , copied = 305104
              , slop = 250088
              , frag = 3088384
              , parNThreads = 12
              , parMaxCopied = 4246
              , parTotCopied = 305104
              }
          , evCap = Nothing
          }

    print (runGet (getEvent parsers) (runPut (putEvent event)))

Output:

bug: Data.Binary.Get.runGet at position 60: not enough bytes
CallStack (from HasCallStack):
  error, called at libraries/binary/src/Data/Binary/Get.hs:351:5 in binary-0.8.5.1:Data.Binary.Get

osa1 avatar Aug 28 '18 13:08 osa1

What does mkEventParsers do?

maoe avatar Aug 28 '18 23:08 maoe

Given a header it returns event parsers (mkEventParsers :: Header -> EventParsers). See the definition here: https://github.com/osa1/ghc-events/blob/5090f824fa709415336248342674fdd4862695da/src/GHC/RTS/Events/Incremental.hs#L161 basically just the event parser builder from mkEventDecoder factored out for reuse.

osa1 avatar Aug 29 '18 09:08 osa1

I added a bunch of debug prints and it seems like the parser for EVENT_GC_STATS_GHC event can actually parse this and the parsed evSpec is identical to the original one, but somehow getEvent needs one more word before returning. If I add one more word to the bytestring

let bs = runPut (putEvent event)
print (runGet (getEvent parsers) (bs <> "00000000"))

it works...

osa1 avatar Aug 29 '18 14:08 osa1

OK so this is because of padParser which for some reason makes this parser expect one more word after the actual contents. Investigating more ...

osa1 avatar Aug 29 '18 14:08 osa1

So according to the event type parser event type of EVENT_GC_STATS_GHC is

EventType {num = 53, desc = "GC statistics", size = Just 58}

this says the size is 58 but actually it's declared as 50 in GHC.RTS.Events.Binary and putEvent also puts 50 bytes for the event spec (60 when you add the timestamp and event type).

So assuming event type parser is not buggy

  • ghc-events thinks size of EVENT_GC_STATS_GHC is 50
  • GHC thinks its size is 58

Not sure why there's a mismatch. Perhaps there's one more field to EVENT_GC_STATS that we currently don't parse?

osa1 avatar Aug 29 '18 14:08 osa1

Right, so GHC generates one more word for EVENT_GC_STATS_GHC which is for "par_balanced_copied": https://github.com/ghc/ghc/blob/65eec9cfd4410c0e30b0ed06116c15f8ce3de49d/rts/eventlog/EventLog.c#L413-L424

Not sure how to deal with this ...

osa1 avatar Aug 29 '18 14:08 osa1

Quoting from the diff that added this new field: https://phabricator.haskell.org/D3658

Add the new value at the end, and we'll gradually deprecate par_max_copied. Old clients that expect the old format will understand everything except the new field, and will mostly work, except that they might miscalculate work balance.

So they relied on the fact that ghc-events (or other GHC event parsers) can deal with events larger than the parser for the event expects.

osa1 avatar Aug 29 '18 15:08 osa1