cardano-node
cardano-node copied to clipboard
[BUG] - Memory leak when running localChainSyncClient via connectToLocalNode recursively
Internal/External External
Area
Other
cardano-api / connectToLocalNode / localChainSyncClient
Summary
When running the localChainSyncClient via connectToLocalNode within a recursion such as forever or a function that does explicit recursive calls in some cases (e.g. to catch exceptions and then retry) then the application is leaking memory in chainSyncClientPeerSender growing quickly and unbounded while syncing (rolling forward) with the chain.
Steps to reproduce
- Use this example as starting point: https://github.com/input-output-hk/cardano-node/blob/bench-1.35.6/cardano-client-demo/ScanBlocksPipelined.hs
- Put a
foreveraroundconnectToLocalNode - Run the example in profiled mode / add
-sto rtsopts - Let the example running for a few seconds and terminate it
- Inspect
bytes maximum residency: it is growing very large. - When generating an eventlog via
-i1.0 -hc -l-aguand generating a html viaeventlog2htmlit shows a growing memory leak produced bychainSyncClientPeerSenderas well as Heap information shows ever-growing Heap Size, Blocks Size and Live Bytes.
Expected behavior
Expect that simply adding a recursion should not leak memory. Without forever as in the example above, bytes maximum residency does not exceeding 6MByte when compiled in profiling and 2MByte when not compiled in profiling.
System info (please complete the following information):
- OS Name: Ubuntu
- OS Version: 22.04
- Node version: connecting to preprod via docker-compose
inputoutput/cardano-node:${CARDANO_NODE_VERSION:-1.35.7}
This is not restricted to localChainSyncClient but seems to be a general problem of how Cardano Api interacts with a local Cardano node.
Here is a repo which contains 2 minimal examples that reproduce the behaviour of growing memory: https://github.com/jonathangenlambda/cardanoapileak
Thanks for the report, and minimal examples. Which compiler version are you using? We've seen some memory leaks introduced by ghc-9.2 which went away with ghc-9.4. It was fixed in a recent version of ouroboros-network-protocols.
The first suspect is the classic problem with streaming (all mini-protocols are using a similar API). This is well explained in this blog post. Using -fno-full-laziness could alleviate the problem.
I noticed you're using ghc-9.2.5 in your example. Can you reproduce it with ghc-9.4 or 9.6?
@coot Thanks for your replies.
Regarding compiler: we are using 9.2.5
I ran the Leaky minimal example with ghc-options: -fno-full-laziness on ghc-9.2.5 but no luck, the memory grows unbounded.
I tried running the minimal example with ghc-9.4.4 but it didn't compile (ouroboros-consensus-0.9.0.0 failed).
However when trying with ghc-9.6.3 (without using -fno-full-laziness) compilation worked and the memory seemed to flat out around 125 MiB (according to Ubuntu 22.04 System Monitor).
Good to hear that ghc-9.6 works for you. Can you upgrade the compiler to 9.6? If not you could also try to add this snippet to your cabal.project.local file:
project ouroboros-network-protocols
ghc-options: -fno-full-laziness
Unfortunately, upgrading the compiler to 9.6 is curently not an option.
I added the snipped to the cabal.project file of the example - however the memory leak persists, with memory growing unbounded according to Ubuntu 22.04 System Monitor when running the Leaky example - within ~10 minutes it grew to over 800 MiB.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.