bitswap/client: memory leak on BlockPresenceManager
Recently, I observed that when using bitswap for fetching data, there is a large amount of memory consumption and no release after downloading.
I use pprof and find most heap is occupied here. https://github.com/ipfs/boxo/blob/89bceff34bf108a46795d26938e6aa8c2f876fc4/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go#L40
I try to print stat after bpm.presence[c] = make(map[peer.ID]bool) and I found bpm.presence would delete on BlockPresenceManager.RemoveKeys, so I added some logs.
I found that there is a race condition problem here.
BlockPresenceManager.RemoveKeyshas been called and remove the CidABlockPresenceManager.ReceiveFromhas been called and createdbpm.presence[CidA]bpm.presence[CidA]is no longer used or removed, and the map is getting bigger and bigger.
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:
- "Priority" labels will show how urgent this is for the team.
- "Status" labels will show if this is ready to be worked on, blocked, or in progress.
- "Need" labels will indicate if additional input or analysis is required.
Finally, remember to use https://discuss.ipfs.io if you just need general support.
@Dreamacro can you attach or link to a pprof dump for analysis? Also, which versions of boxo and go-libp2p are you using?
@aschmahmann
I first provide a few fragments of pprof/heap. I can confirm a memory leak here because the leakage disappears when I replace map[cid.Cid]map[peer.ID]bool with a cache with time. https://github.com/Dreamacro/boxo/commit/df0a587788c0e8ea70d22d051b11dae8580ad629
heap fragments: (Although boxo 0.15 is shown below, it has been upgraded to boxo 0.17)
258: 1387008 [2230: 11988480] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
# 0x1c257eb github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
# 0x1c257ef github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
# 0x1c549fb github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
# 0x1c58add github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd github.com/ipfs/[email protected]/bitswap/client/client.go:336
# 0x1c58ff9 github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279 github.com/ipfs/[email protected]/bitswap/client/client.go:380
# 0x1c6df44 github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84 github.com/ipfs/[email protected]/bitswap/bitswap.go:181
# 0x1c39d30 github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530 github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
# 0x1b0a0de github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
# 0x1b09107 github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7 github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
# 0x1b32686 github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6 github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137
109: 1185920 [3482: 37884160] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
# 0x1c257eb github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
# 0x1c257ef github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
# 0x1c549fb github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
# 0x1c58add github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd github.com/ipfs/[email protected]/bitswap/client/client.go:336
# 0x1c58ff9 github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279 github.com/ipfs/[email protected]/bitswap/client/client.go:380
# 0x1c6df44 github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84 github.com/ipfs/[email protected]/bitswap/bitswap.go:181
# 0x1c39d30 github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530 github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
# 0x1b0a0de github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
# 0x1b09107 github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7 github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
# 0x1b32686 github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6 github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137
180: 483840 [1333: 3583104] @ 0xcd2266 0xcd3d10 0xcd7025 0x1c257ec 0x1c257f0 0x1c549fc 0x1c58ade 0x1c58ffa 0x1c6df45 0x1c39d31 0x1b0a0df 0x1b09108 0x1b32687 0xd32fe1
# 0x1c257eb github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).updateBlockPresence+0x22b github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:48
# 0x1c257ef github.com/ipfs/boxo/bitswap/client/internal/blockpresencemanager.(*BlockPresenceManager).ReceiveFrom+0x22f github.com/ipfs/[email protected]/bitswap/client/internal/blockpresencemanager/blockpresencemanager.go:30
# 0x1c549fb github.com/ipfs/boxo/bitswap/client/internal/sessionmanager.(*SessionManager).ReceiveFrom+0x9b github.com/ipfs/[email protected]/bitswap/client/internal/sessionmanager/sessionmanager.go:158
# 0x1c58add github.com/ipfs/boxo/bitswap/client.(*Client).receiveBlocksFrom+0x5fd github.com/ipfs/[email protected]/bitswap/client/client.go:336
# 0x1c58ff9 github.com/ipfs/boxo/bitswap/client.(*Client).ReceiveMessage+0x279 github.com/ipfs/[email protected]/bitswap/client/client.go:380
# 0x1c6df44 github.com/ipfs/boxo/bitswap.(*Bitswap).ReceiveMessage+0x84 github.com/ipfs/[email protected]/bitswap/bitswap.go:181
# 0x1c39d30 github.com/ipfs/boxo/bitswap/network.(*impl).handleNewStream+0x530 github.com/ipfs/[email protected]/bitswap/network/ipfs_impl.go:431
# 0x1b0a0de github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).SetStreamHandler.func1+0x3e github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:598
# 0x1b09107 github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).newStreamHandler+0x6a7 github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:440
# 0x1b32686 github.com/libp2p/go-libp2p/p2p/net/swarm.(*Conn).start.func1.1+0xa6 github.com/libp2p/[email protected]/p2p/net/swarm/swarm_conn.go:137
@Dreamacro PR #636 is a simple code modification that should help with unbounded map growth observed in this issue. The change allows GC to free memory when the BlockPresenceManager map becomes empty, and removes peers from BlockPresenceManager that have sent repeated DONT_HAVEs. This is a smaller scoped change than using an LRU cache as you have done in the example fix you provided, and I would be interested in getting your feedback as to whether it is sufficient for your test case.
@gammazero I can't quite confirm if this PR fixes the memory leak or not. The root cause of the problem is that after SessionManager.cancelWants is called, there may still be nodes to send HAVE or DONT_HAVE in the future, and it is handled by ReceiveFrom finally.
https://github.com/ipfs/boxo/blob/c94c23a30d459c704a6edd6a1f06c6a302b52e05/bitswap/client/internal/session/sessionwantsender.go#L458
I'm not sure if the changes here can fix the root cause of the memory leak, if HAVE/DONT_HAVE is no longer processed after the prune node, the problem should be fixed
The memory leak seems fixed after following the upstream boxo(0.27.4).