boxo icon indicating copy to clipboard operation
boxo copied to clipboard

bitswap/client: memory leak on BlockPresenceManager

Open Dreamacro opened this issue 1 year ago • 5 comments

Recently, I observed that when using bitswap for fetching data, there is a large amount of memory consumption and no release after downloading.

I use pprof and find most heap is occupied here. https://github.com/ipfs/boxo/blob/89bceff34bf108a46795d26938e6aa8c2f876fc4/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go#L40

I try to print stat after bpm.presence[c] = make(map[peer.ID]bool) and I found bpm.presence would delete on BlockPresenceManager.RemoveKeys, so I added some logs.

I found that there is a race condition problem here.

  1. BlockPresenceManager.RemoveKeys has been called and remove the CidA
  2. BlockPresenceManager.ReceiveFrom has been called and created bpm.presence[CidA]
  3. bpm.presence[CidA] is no longer used or removed, and the map is getting bigger and bigger.

Dreamacro avatar Feb 01 '24 10:02 Dreamacro

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

welcome[bot] avatar Feb 01 '24 10:02 welcome[bot]

@Dreamacro can you attach or link to a pprof dump for analysis? Also, which versions of boxo and go-libp2p are you using?

aschmahmann avatar Feb 06 '24 14:02 aschmahmann

@aschmahmann

I first provide a few fragments of pprof/heap. I can confirm a memory leak here because the leakage disappears when I replace map[cid.Cid]map[peer.ID]bool with a cache with time. https://github.com/Dreamacro/boxo/commit/df0a587788c0e8ea70d22d051b11dae8580ad629

heap fragments: (Although boxo 0.15 is shown below, it has been upgraded to boxo 0.17)

258: 1387008 [2230: 11988480] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#	0x1c257eb	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b	github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#	0x1c257ef	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f		github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#	0x1c549fb	github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b				github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#	0x1c58add	github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd							github.com/ipfs/[email protected]/bitswap/client/client.go:336
#	0x1c58ff9	github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279							github.com/ipfs/[email protected]/bitswap/client/client.go:380
#	0x1c6df44	github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84								github.com/ipfs/[email protected]/bitswap/bitswap.go:181
#	0x1c39d30	github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530							github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
#	0x1b0a0de	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e					github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
#	0x1b09107	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7						github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
#	0x1b32686	github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6							github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137

109: 1185920 [3482: 37884160] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#	0x1c257eb	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b	github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#	0x1c257ef	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f		github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#	0x1c549fb	github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b				github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#	0x1c58add	github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd							github.com/ipfs/[email protected]/bitswap/client/client.go:336
#	0x1c58ff9	github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279							github.com/ipfs/[email protected]/bitswap/client/client.go:380
#	0x1c6df44	github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84								github.com/ipfs/[email protected]/bitswap/bitswap.go:181
#	0x1c39d30	github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530							github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
#	0x1b0a0de	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e					github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
#	0x1b09107	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7						github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
#	0x1b32686	github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6							github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137

180: 483840 [1333: 3583104] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
#	0x1c257eb	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b	github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
#	0x1c257ef	github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f		github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
#	0x1c549fb	github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b				github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
#	0x1c58add	github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd							github.com/ipfs/[email protected]/bitswap/client/client.go:336
#	0x1c58ff9	github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279							github.com/ipfs/[email protected]/bitswap/client/client.go:380
#	0x1c6df44	github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84								github.com/ipfs/[email protected]/bitswap/bitswap.go:181
#	0x1c39d30	github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530							github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
#	0x1b0a0de	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e					github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
#	0x1b09107	github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7						github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
#	0x1b32686	github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6							github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137

Dreamacro avatar Feb 07 '24 12:02 Dreamacro

@Dreamacro PR #636 is a simple code modification that should help with unbounded map growth observed in this issue. The change allows GC to free memory when the BlockPresenceManager map becomes empty, and removes peers from BlockPresenceManager that have sent repeated DONT_HAVEs. This is a smaller scoped change than using an LRU cache as you have done in the example fix you provided, and I would be interested in getting your feedback as to whether it is sufficient for your test case.

gammazero avatar Jul 08 '24 23:07 gammazero

@gammazero I can't quite confirm if this PR fixes the memory leak or not. The root cause of the problem is that after SessionManager.cancelWants is called, there may still be nodes to send HAVE or DONT_HAVE in the future, and it is handled by ReceiveFrom finally.

https://github.com/ipfs/boxo/blob/c94c23a30d459c704a6edd6a1f06c6a302b52e05/bitswap/client/internal/session/sessionwantsender.go#L458

I'm not sure if the changes here can fix the root cause of the memory leak, if HAVE/DONT_HAVE is no longer processed after the prune node, the problem should be fixed

Dreamacro avatar Jul 09 '24 07:07 Dreamacro

The memory leak seems fixed after following the upstream boxo(0.27.4).

Dreamacro avatar Feb 25 '25 08:02 Dreamacro