ThreadScope icon indicating copy to clipboard operation
ThreadScope copied to clipboard

Enhance handling of large eventlogs

Open ghc-mirror opened this issue 11 years ago • 5 comments

Original reporter: jan.stolarek@

When I enable detailed spark logging via -lf flag I end up with huge eventlog files (130MB). Attempting to load these into ThreadScope practically kills my OS - memory runs out, swapping begins and I am forced to kill TS (which takes some time before the OS actually responds and kills the process). This makes -lf flag useless for my program and I think this might not be uncommon situation. It would be good if TS supported some sort of lazy loading of big eventlogs, so users could at least view parts of the log.

ghc-mirror avatar May 16 '14 10:05 ghc-mirror

Even with -l flag I get 500 MB eventlog and threadscope eats 16 GB of RAM. Please provide either some sort of granularity control before loading the file or live streaming.

ghost avatar Jan 05 '16 00:01 ghost

This is still a problem. Loading a 1G eventlog file is impossible even with 32G RAM. I think we need two things:

  • Externally sort GHC-generated .eventlog files. Currently for sorting events ThreadScope uses ghc-events's sortEvents, which requires all events to be in memory and uses Data.List to sort. See https://github.com/haskell/ghc-events/issues/32 for the tracking issue for this.

  • Implementing an abstraction over Array Int Event that doesn't require loading the whole file into memory. As far as I can see this array is used in two places

    • hecEventArray which uses it to implement - eventIndexToTimestamp :: HECs -> Int -> Timestamp - timestampToEventIndex :: HECs -> Timestamp -> Int

    • EventsView which uses a range of it to show the "Raw events" tab

    So it seems to me that we need to support three operations:

    1. Get nth event
    2. Get events in the given range (can be implemented using (i))
    3. Get index of the event at given timestamp (this currently does binary search)

One idea comes to mind is to use something like SQLite which makes these operations almost trivial.

One thing that may be a problem is when scrolling the "Raw events" tab because of querying filesystem-backed event database (SQLite or not), so we may have to implement lazy rendering of "Raw events" (as far as I can see it doesn't support this currently, drawEvents blocks the thread until all events in the range are drawn).

Any other ideas?

osa1 avatar Aug 25 '18 08:08 osa1

I started working on a fix. I currently have an external sort library and another library for filesystem-backed, cached arrays. I'll report in a few days probably.

osa1 avatar Aug 25 '18 11:08 osa1

Currently blocked on https://github.com/haskell/ghc-events/issues/42.

osa1 avatar Aug 29 '18 15:08 osa1

We may need to fix haskell/ghc-events#14 as well since it causes ghc-events to crash when reading back serialized events for eventlogs that contain deprecated events.

maoe avatar Sep 01 '18 20:09 maoe