cardano-node icon indicating copy to clipboard operation
cardano-node copied to clipboard

[BUG] - Memory leak when running localChainSyncClient via connectToLocalNode recursively

Open jonathangenlambda opened this issue 1 year ago • 8 comments

Internal/External External

Area Other cardano-api / connectToLocalNode / localChainSyncClient

Summary When running the localChainSyncClient via connectToLocalNode within a recursion such as forever or a function that does explicit recursive calls in some cases (e.g. to catch exceptions and then retry) then the application is leaking memory in chainSyncClientPeerSender growing quickly and unbounded while syncing (rolling forward) with the chain.

Steps to reproduce

  1. Use this example as starting point: https://github.com/input-output-hk/cardano-node/blob/bench-1.35.6/cardano-client-demo/ScanBlocksPipelined.hs
  2. Put a forever around connectToLocalNode
  3. Run the example in profiled mode / add -s to rtsopts
  4. Let the example running for a few seconds and terminate it
  5. Inspect bytes maximum residency: it is growing very large.
  6. When generating an eventlog via -i1.0 -hc -l-agu and generating a html via eventlog2html it shows a growing memory leak produced by chainSyncClientPeerSender as well as Heap information shows ever-growing Heap Size, Blocks Size and Live Bytes.

Expected behavior Expect that simply adding a recursion should not leak memory. Without forever as in the example above, bytes maximum residency does not exceeding 6MByte when compiled in profiling and 2MByte when not compiled in profiling.

System info (please complete the following information):

  • OS Name: Ubuntu
  • OS Version: 22.04
  • Node version: connecting to preprod via docker-compose inputoutput/cardano-node:${CARDANO_NODE_VERSION:-1.35.7}

jonathangenlambda avatar Nov 15 '23 16:11 jonathangenlambda

This is not restricted to localChainSyncClient but seems to be a general problem of how Cardano Api interacts with a local Cardano node.

Here is a repo which contains 2 minimal examples that reproduce the behaviour of growing memory: https://github.com/jonathangenlambda/cardanoapileak

jonathangenlambda avatar Nov 21 '23 14:11 jonathangenlambda

Thanks for the report, and minimal examples. Which compiler version are you using? We've seen some memory leaks introduced by ghc-9.2 which went away with ghc-9.4. It was fixed in a recent version of ouroboros-network-protocols.

coot avatar Nov 22 '23 12:11 coot

The first suspect is the classic problem with streaming (all mini-protocols are using a similar API). This is well explained in this blog post. Using -fno-full-laziness could alleviate the problem.

coot avatar Nov 22 '23 12:11 coot

I noticed you're using ghc-9.2.5 in your example. Can you reproduce it with ghc-9.4 or 9.6?

coot avatar Nov 22 '23 13:11 coot

@coot Thanks for your replies.

Regarding compiler: we are using 9.2.5

I ran the Leaky minimal example with ghc-options: -fno-full-laziness on ghc-9.2.5 but no luck, the memory grows unbounded.

I tried running the minimal example with ghc-9.4.4 but it didn't compile (ouroboros-consensus-0.9.0.0 failed).

However when trying with ghc-9.6.3 (without using -fno-full-laziness) compilation worked and the memory seemed to flat out around 125 MiB (according to Ubuntu 22.04 System Monitor).

jonathangenlambda avatar Nov 23 '23 03:11 jonathangenlambda

Good to hear that ghc-9.6 works for you. Can you upgrade the compiler to 9.6? If not you could also try to add this snippet to your cabal.project.local file:

project ouroboros-network-protocols
  ghc-options: -fno-full-laziness

coot avatar Nov 25 '23 09:11 coot

Unfortunately, upgrading the compiler to 9.6 is curently not an option.

I added the snipped to the cabal.project file of the example - however the memory leak persists, with memory growing unbounded according to Ubuntu 22.04 System Monitor when running the Leaky example - within ~10 minutes it grew to over 800 MiB.

jonathangenlambda avatar Nov 27 '23 08:11 jonathangenlambda

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

github-actions[bot] avatar Dec 28 '23 01:12 github-actions[bot]