reactive-banana Memory leak with dynamic behavior switching

trafficstars

The following program:

{-# language BlockArguments #-}
module Main where

import Control.Monad
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak

withGhcDebug = id

main :: IO ()
main = withGhcDebug do
  (ah1, fire1) <- newAddHandler

  actuate =<< compile do
    e <- fromAddHandler ah1

    let e2 = observeE $ e $> do
          stepper () (void e)

    b' <- switchB (pure ()) e2

    reactimate $ return <$> b' <@ e

  performGC
  putStrLn "Running"

  replicateM_ 50000 $ do
    fire1 ()
    performGC

Leaks memory:

I've also modified doAddChild to print out the number of children in a parent:

doAddChild :: SomeNode -> SomeNode -> IO ()
doAddChild (P parent) (P child) = do
    level1 <- _levelP <$> readRef child
    level2 <- _levelP <$> readRef parent
    let level = level1 `max` (level2 + 1)
    w <- parent `connectChild` P child
    print =<< length . _childrenP <$> readRef parent -- <- NEW
    modify' child $ set levelP level . update parentsP (w:)
doAddChild (P parent) node = void $ parent `connectChild` node
doAddChild (L _) _ = error "doAddChild: Cannot add children to LatchWrite"
doAddChild (O _) _ = error "doAddChild: Cannot add children to Output"

This shows that a node has a 50000 children at the end. My guess is this is the Event e - whenever execute fires we attach a new stepper to e, but this child is never removed - even though switchB should be discarding them.

Jun 15 '22 10:06 ocharles

Nice, can you share how you made the pretty graph?

Jun 15 '22 15:06 mitchellwrosen

Sure! I just used eventlog2html :smile:

More specifically:

$ cabal run leak -- +RTS -l -hi

and build with -rtsopts -eventlog.

I often also build with -finfo-table-map -fdistinct-constructor-tables, as per https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/

Jun 15 '22 16:06 ocharles

The same problem is present with just dynamic event switching - no need to bring Behaviors in:

{-# language BlockArguments #-}
module Main where

import Control.Monad
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak

withGhcDebug = id

main :: IO ()
main = withGhcDebug do
  (ah1, fire1) <- newAddHandler

  actuate =<< compile do
    e <- fromAddHandler ah1

    let e2 = observeE $ e $> do
          accumE () (id <$ e)

    e3 <- switchE never e2

    reactimate $ return <$> e3

  performGC
  putStrLn "Running"

  replicateM_ 10000 $ do
    fire1 ()
    performGC

I'll try and solve this leak first.

Jun 16 '22 13:06 ocharles

Ok, the fix for both of these leaks isn't hard - we can just modify doAddChild to cull any dead children:

doAddChild (P parent) (P child) = do
    level1 <- _levelP <$> readRef child
    level2 <- _levelP <$> readRef parent
    let level = level1 `max` (level2 + 1)
    w <- parent `connectChild` P child

    -- Remove any dead children. These three lines are new.
    let alive w = maybe False (const True) <$> deRefWeak w
    children' <- filterM alive . _childrenP =<< readRef parent
    modify' parent $ set childrenP children'

    modify' child $ set levelP level . update parentsP (w:)

But I'm not particularly happy with this solution. When the switchE fires I feel we should be able to propagate this information all the way up to e. I'll have a think about how to do this.

Jun 16 '22 13:06 ocharles

When implementing this, I was hoping to use finalizers to remove dead children — i.e. when switchE switches to a new event, the old event may become garbage and the corresponding finalizer would remove it from the _childrenP field.

Hm. Finalizers are run concurrently, but to keep our sanity, changes to the network need to be sequential and scheduled (e.g. using a writer part of Build monad). Perhaps we should implement our own GC pass that is executed at the end of every step, and the finalizers simply tell our GC more specific information about which weak pointers it should remove? 🤔

(One issue that I didn't think deeply enough about is the question of how fast we can remove transitive dependencies. I.e. event e3 may depend on e2 which depends on e1. Now, if e3 is not used anymore, then both e2 and e1 can be garbage collected, but that should preferably happen in a single GC pass as opposed of two GC passes where we first discover that e2 is dead and only in the next pass that e1 is also dead because e2 is dead. The GHC GC does this alright, but any GC addendum that we implement might not.)

Jun 17 '22 21:06 HeinrichApfelmus

@HeinrichApfelmus I've also thought about using finalizers, but the whole thing seems a lot more complex/action-at-a-distance than it needs to be. As far as I'm aware, we always have the entire graph right in front of us, through a Pulses parent/children lists. So if we dynamically switch away from something, we should - at that point - be able to find strongly connected components within this graph that are no longer reachable and nuke the whole lot.

I don't like finalizers partly because it's unclear when they will run, but more that it's unclear if they will run at all! I'd hate to be in a position where I accumulate just enough garbage to impact performance, but not enough to trigger the right generation GC to solve the problem.

Jun 19 '22 16:06 ocharles

So if we dynamically switch away from something, we should - at that point - be able to find strongly connected components within this graph that are no longer reachable and nuke the whole lot.

Yes and no. The trouble is twofold:

The program may reference a Pulse (e.g. through an Event) even though that Pulse is currently not an active part of the network — but it may become part of the network again later. For example, a switchE periodically switching between two events e1 and e2, has this property — both events need to be kept alive (especially if they involve state), but only one of them is in the transitive closure of the current list of reactimate. This implies that we do need help from the garbage collector.
Conversely, the garbage collector may still think that a Pulse is alive during a switchE, even though that Pulse becomes dead through the switch. Hence, the garbage collector may have some delay, and tell us that a Pulse can be removed only some time after the moment of switching. This implies that we need to expect help from the garbage collector in an asynchronous manner.

I don't like finalizers partly because it's unclear when they will run, but more that it's unclear if they will run at all!

I do agree that the documentation on finalizers is rather pessimistic. However, I feel that we may not have a choice, and in practice, it does not seem too bad (well, if it is bad, then we can report this as a bug in GHC. 😄)

Jun 19 '22 18:06 HeinrichApfelmus

Yea, I was thinking over 1 yesterday! Thanks for sharing. Something I also want to do is try modelling our graph in Alloy and to use a model checker to work out the complexities here!

Jun 20 '22 06:06 ocharles

Ok, I might have another fix:

 connectChild parent child = do
     w <- mkWeakNodeValue child child
     modify' parent $ update childrenP (w:)
+
+    -- Add a finalizer to remove the child from the parent when the child dies.
+    case child of
+      P c@(Ref r _) -> addFinalizer r $ removeParents c
+      _ -> return ()
+
     mkWeakNodeValue child (P parent)        -- child keeps parent alive

The idea is pretty trivial - when a Pulse is unreachable by the GC, then remove it from all parents. I think that the only thing we're actually leaking (in the examples here) is the Weak value for the child and the cons cell in the parents list of children. I tested this with the following (even simpler) repro:

{-# language BlockArguments #-}
module Main where

import Control.Monad
import Control.Monad.IO.Class
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak
import Control.Concurrent (threadDelay, yield)

withGhcDebug = id

main :: IO ()
main = withGhcDebug do
  (ah1, fire1) <- newAddHandler

  actuate =<< compile do
    e <- fromAddHandler ah1

    e2 <- execute $ e $> do
      accumE () (id <$ e)

    reactimate $ return () <$ e2

  performGC
  putStrLn "Running"

  replicateM_ 10000 $ do
    fire1 ()
    performMajorGC
    -- yield so finalizers can run.
    yield

  putStrLn "Done"

Ran with -hi profile and analyzed in eventlog2html, we see

Some noise, but that clear blue line is the signal - a clear leak.

With the fix above, we get:

But I also have to run 10x the amount of iterations otherwise it terminates too quickly!

So I think I've got a good handle on at least one fix. I think the way to proceed from here is to add a finalizer when we call newLatch or newPulse though - connectChild is obviously the wrong place.

Oct 16 '22 15:10 ocharles

A note to myself as to why we can't just use Ref:

Assume we have some chain of Pulses p1, p2, ... pn, where each pulse is a child of the previous (so p1 is the parent of p2, etc).
We have some top-level pulse p which is the parent of p1.
Now, introduce pX, which is derived by dynamically switching between some other Pulse and pn.
If we clean up the graph the instant pX switches out of pn, then we'll end up detaching p1 from p.
However, if we switch back to pn, then we'll never get any events, because pn is disconnected from p!

This is why we think we need help from the GC. When pX switches out of pn then yes, pX should reparent. But we do still need to keep pushing p through pn because the dynamic event switch will keep pn alive. I agree that it's going to be very hard to do this without letting the GC inform us.

Note that if we used dynamic event switching and switched out of pn and don't have the possibility of switching back (e.g., something uses never or some other mechanism that makes it impossible), then we'd lose any strong pointer to pn allowing it to be GCed.

I need to think about promptly cleaning up a whole sequence of Pulses, but otherwise this is taking shape

Oct 16 '22 16:10 ocharles

reactive-banana reactive-banana copied to clipboard

Memory leak with dynamic behavior switching

reactive-banana
reactive-banana copied to clipboard