reactive-banana
reactive-banana copied to clipboard
Memory leak with dynamic behavior switching
The following program:
{-# language BlockArguments #-}
module Main where
import Control.Monad
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak
withGhcDebug = id
main :: IO ()
main = withGhcDebug do
(ah1, fire1) <- newAddHandler
actuate =<< compile do
e <- fromAddHandler ah1
let e2 = observeE $ e $> do
stepper () (void e)
b' <- switchB (pure ()) e2
reactimate $ return <$> b' <@ e
performGC
putStrLn "Running"
replicateM_ 50000 $ do
fire1 ()
performGC
Leaks memory:

I've also modified doAddChild to print out the number of children in a parent:
doAddChild :: SomeNode -> SomeNode -> IO ()
doAddChild (P parent) (P child) = do
level1 <- _levelP <$> readRef child
level2 <- _levelP <$> readRef parent
let level = level1 `max` (level2 + 1)
w <- parent `connectChild` P child
print =<< length . _childrenP <$> readRef parent -- <- NEW
modify' child $ set levelP level . update parentsP (w:)
doAddChild (P parent) node = void $ parent `connectChild` node
doAddChild (L _) _ = error "doAddChild: Cannot add children to LatchWrite"
doAddChild (O _) _ = error "doAddChild: Cannot add children to Output"
This shows that a node has a 50000 children at the end. My guess is this is the Event e - whenever execute fires we attach a new stepper to e, but this child is never removed - even though switchB should be discarding them.
Nice, can you share how you made the pretty graph?
Sure! I just used eventlog2html :smile:
More specifically:
$ cabal run leak -- +RTS -l -hi
and build with -rtsopts -eventlog.
I often also build with -finfo-table-map -fdistinct-constructor-tables, as per https://well-typed.com/blog/2021/01/first-look-at-hi-profiling-mode/
The same problem is present with just dynamic event switching - no need to bring Behaviors in:
{-# language BlockArguments #-}
module Main where
import Control.Monad
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak
withGhcDebug = id
main :: IO ()
main = withGhcDebug do
(ah1, fire1) <- newAddHandler
actuate =<< compile do
e <- fromAddHandler ah1
let e2 = observeE $ e $> do
accumE () (id <$ e)
e3 <- switchE never e2
reactimate $ return <$> e3
performGC
putStrLn "Running"
replicateM_ 10000 $ do
fire1 ()
performGC
I'll try and solve this leak first.
Ok, the fix for both of these leaks isn't hard - we can just modify doAddChild to cull any dead children:
doAddChild (P parent) (P child) = do
level1 <- _levelP <$> readRef child
level2 <- _levelP <$> readRef parent
let level = level1 `max` (level2 + 1)
w <- parent `connectChild` P child
-- Remove any dead children. These three lines are new.
let alive w = maybe False (const True) <$> deRefWeak w
children' <- filterM alive . _childrenP =<< readRef parent
modify' parent $ set childrenP children'
modify' child $ set levelP level . update parentsP (w:)
But I'm not particularly happy with this solution. When the switchE fires I feel we should be able to propagate this information all the way up to e. I'll have a think about how to do this.
When implementing this, I was hoping to use finalizers to remove dead children — i.e. when switchE switches to a new event, the old event may become garbage and the corresponding finalizer would remove it from the _childrenP field.
Hm. Finalizers are run concurrently, but to keep our sanity, changes to the network need to be sequential and scheduled (e.g. using a writer part of Build monad). Perhaps we should implement our own GC pass that is executed at the end of every step, and the finalizers simply tell our GC more specific information about which weak pointers it should remove? 🤔
(One issue that I didn't think deeply enough about is the question of how fast we can remove transitive dependencies. I.e. event e3 may depend on e2 which depends on e1. Now, if e3 is not used anymore, then both e2 and e1 can be garbage collected, but that should preferably happen in a single GC pass as opposed of two GC passes where we first discover that e2 is dead and only in the next pass that e1 is also dead because e2 is dead. The GHC GC does this alright, but any GC addendum that we implement might not.)
@HeinrichApfelmus I've also thought about using finalizers, but the whole thing seems a lot more complex/action-at-a-distance than it needs to be. As far as I'm aware, we always have the entire graph right in front of us, through a Pulses parent/children lists. So if we dynamically switch away from something, we should - at that point - be able to find strongly connected components within this graph that are no longer reachable and nuke the whole lot.
I don't like finalizers partly because it's unclear when they will run, but more that it's unclear if they will run at all! I'd hate to be in a position where I accumulate just enough garbage to impact performance, but not enough to trigger the right generation GC to solve the problem.
So if we dynamically switch away from something, we should - at that point - be able to find strongly connected components within this graph that are no longer reachable and nuke the whole lot.
Yes and no. The trouble is twofold:
- The program may reference a
Pulse(e.g. through anEvent) even though thatPulseis currently not an active part of the network — but it may become part of the network again later. For example, aswitchEperiodically switching between two eventse1ande2, has this property — both events need to be kept alive (especially if they involve state), but only one of them is in the transitive closure of the current list ofreactimate. This implies that we do need help from the garbage collector. - Conversely, the garbage collector may still think that a
Pulseis alive during aswitchE, even though thatPulsebecomes dead through the switch. Hence, the garbage collector may have some delay, and tell us that aPulsecan be removed only some time after the moment of switching. This implies that we need to expect help from the garbage collector in an asynchronous manner.
I don't like finalizers partly because it's unclear when they will run, but more that it's unclear if they will run at all!
I do agree that the documentation on finalizers is rather pessimistic. However, I feel that we may not have a choice, and in practice, it does not seem too bad (well, if it is bad, then we can report this as a bug in GHC. 😄)
Yea, I was thinking over 1 yesterday! Thanks for sharing. Something I also want to do is try modelling our graph in Alloy and to use a model checker to work out the complexities here!
Ok, I might have another fix:
connectChild parent child = do
w <- mkWeakNodeValue child child
modify' parent $ update childrenP (w:)
+
+ -- Add a finalizer to remove the child from the parent when the child dies.
+ case child of
+ P c@(Ref r _) -> addFinalizer r $ removeParents c
+ _ -> return ()
+
mkWeakNodeValue child (P parent) -- child keeps parent alive
The idea is pretty trivial - when a Pulse is unreachable by the GC, then remove it from all parents. I think that the only thing we're actually leaking (in the examples here) is the Weak value for the child and the cons cell in the parents list of children. I tested this with the following (even simpler) repro:
{-# language BlockArguments #-}
module Main where
import Control.Monad
import Control.Monad.IO.Class
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks
import System.Mem
import System.Mem.Weak
import Control.Concurrent (threadDelay, yield)
withGhcDebug = id
main :: IO ()
main = withGhcDebug do
(ah1, fire1) <- newAddHandler
actuate =<< compile do
e <- fromAddHandler ah1
e2 <- execute $ e $> do
accumE () (id <$ e)
reactimate $ return () <$ e2
performGC
putStrLn "Running"
replicateM_ 10000 $ do
fire1 ()
performMajorGC
-- yield so finalizers can run.
yield
putStrLn "Done"
Ran with -hi profile and analyzed in eventlog2html, we see

Some noise, but that clear blue line is the signal - a clear leak.
With the fix above, we get:

But I also have to run 10x the amount of iterations otherwise it terminates too quickly!
So I think I've got a good handle on at least one fix. I think the way to proceed from here is to add a finalizer when we call newLatch or newPulse though - connectChild is obviously the wrong place.
A note to myself as to why we can't just use Ref:
- Assume we have some chain of
Pulsesp1, p2, ... pn, where each pulse is a child of the previous (sop1is the parent ofp2, etc). - We have some top-level pulse
pwhich is the parent ofp1. - Now, introduce
pX, which is derived by dynamically switching between some otherPulseandpn. - If we clean up the graph the instant
pXswitches out ofpn, then we'll end up detachingp1fromp. - However, if we switch back to
pn, then we'll never get any events, becausepnis disconnected fromp!
This is why we think we need help from the GC. When pX switches out of pn then yes, pX should reparent. But we do still need to keep pushing p through pn because the dynamic event switch will keep pn alive. I agree that it's going to be very hard to do this without letting the GC inform us.
Note that if we used dynamic event switching and switched out of pn and don't have the possibility of switching back (e.g., something uses never or some other mechanism that makes it impossible), then we'd lose any strong pointer to pn allowing it to be GCed.
I need to think about promptly cleaning up a whole sequence of Pulses, but otherwise this is taking shape