Enhance handling of large eventlogs
Original reporter: jan.stolarek@
When I enable detailed spark logging via -lf flag I end up with huge eventlog files (130MB). Attempting to load these into ThreadScope practically kills my OS - memory runs out, swapping begins and I am forced to kill TS (which takes some time before the OS actually responds and kills the process). This makes -lf flag useless for my program and I think this might not be uncommon situation. It would be good if TS supported some sort of lazy loading of big eventlogs, so users could at least view parts of the log.
Even with -l flag I get 500 MB eventlog and threadscope eats 16 GB of RAM. Please provide either some sort of granularity control before loading the file or live streaming.
This is still a problem. Loading a 1G eventlog file is impossible even with 32G RAM. I think we need two things:
-
Externally sort GHC-generated .eventlog files. Currently for sorting events ThreadScope uses
ghc-events'ssortEvents, which requires all events to be in memory and usesData.Listto sort. See https://github.com/haskell/ghc-events/issues/32 for the tracking issue for this. -
Implementing an abstraction over
Array Int Eventthat doesn't require loading the whole file into memory. As far as I can see this array is used in two places-
hecEventArraywhich uses it to implement -eventIndexToTimestamp :: HECs -> Int -> Timestamp-timestampToEventIndex :: HECs -> Timestamp -> Int -
EventsViewwhich uses a range of it to show the "Raw events" tab
So it seems to me that we need to support three operations:
- Get nth event
- Get events in the given range (can be implemented using (i))
- Get index of the event at given timestamp (this currently does binary search)
-
One idea comes to mind is to use something like SQLite which makes these operations almost trivial.
One thing that may be a problem is when scrolling the "Raw events" tab because of querying filesystem-backed event database (SQLite or not), so we may have to implement lazy rendering of "Raw events" (as far as I can see it doesn't support this currently, drawEvents blocks the thread until all events in the range are drawn).
Any other ideas?
I started working on a fix. I currently have an external sort library and another library for filesystem-backed, cached arrays. I'll report in a few days probably.
Currently blocked on https://github.com/haskell/ghc-events/issues/42.
We may need to fix haskell/ghc-events#14 as well since it causes ghc-events to crash when reading back serialized events for eventlogs that contain deprecated events.