Swarm.ResourceMgr.MaxMemory vs GOMEMLIMIT
Checklist
- [X] This is a bug report, not a question. Ask questions on discuss.ipfs.tech.
- [X] I have searched on the issue tracker for my bug.
- [X] I am running the latest kubo version or have an issue updating.
Installation method
dist.ipfs.tech or ipfs-update
Version
v0.32.1
Config
https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgrmaxmemory
Description
Problem
Swarm.ResourceMgr.MaxMemory is often interpreted and used as "memory limit for Kubo", while it is only passed to go-libp2p and used for limited set of things (transports, but not bitswap).
Ref. https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#readme
The default Swarm.ResourceMgr.MaxMemory is dynamic and conservative:
Default: [TOTAL_SYSTEM_MEMORY]/2 Type: optionalBytes
The actual memory limit people want to set is for entire Kubo, and that is likely GOMEMLIMIT but we do not set it by default.
This means:
- some OOM could be avoided, but are not
- some leaks could be mitigated, but are not
- users who set
Swarm.ResourceMgr.MaxMemoryare still getting OOM, because it does not cover all the bases - bad ux, and perception of "Kubo being memory hog"
Solution brainstorm
- Have a sensible (dynamic) limit by default
- Implicit default should be something conservative like we have in
Swarm.ResourceMgr.MaxMemory.[TOTAL_SYSTEM_MEMORY]*0.75?
- Linux users / Docker uses cgroups (control groups) limits extensively to manage and constrain resource usage. We could leverage something like https://github.com/KimMachineGun/automemlimit / https://kupczynski.info/posts/go-container-aware/ to seamlessly respect mem limits set
- Implicit default should be something conservative like we have in
- Allow user to override default
- Respect
GOMEMLIMITif is present in env, it should always take precedence, and implicit default forSwarm.ResourceMgr.MaxMemoryshould be calculated from that ceiling. - Refuse to start if user manually set
Swarm.ResourceMgr.MaxMemoryhigher than implicit (or explicit)GOMEMLIMIT - TBD: to solve UX problem of user searching for
memoryin config and only finding - Detect when users are running with less memory, or limits than https://github.com/ipfs/kubo?tab=readme-ov-file#minimal-system-requirements
ResourceMgr.MaxMemory- Perhaps have clear
Limits.MaxMemory? Or genericEnv.*and support settingEnv.GOMEMLIMITvia config?
- Perhaps have clear
- Respect
cc @sukunrt @ipfs/kubo-maintainers for feedback / ideas.
I cannot think of a program in my computer that has a "MaxMemory" setting and starts torpedoing itself when crossed, so I think this type of setting is not such a good idea, even though it looked pretty good when introduced as a way of making automated decision on the actual limits.
MaxMemory is a bad proxy for something that is mostly a "MaxConnections" thing. I think traditionally p2p programs offered settings like:
- Max connections - offered by proxy by MaxMemory
- Max up/down bandwidth - inexistent
- Max download workers (i.e. parallel downloads) - 10 different knobs related to bitswap, reprovider etc, some exposed and some not
GOMEMLIMIT on the other side forces go to GC more often and it is not a real limit, possibly just makes performance worse and it's going to confuse users in the same way. Users can adjust it though if memory usage pattern needs it.
Also, it might be that the problem is that Kubo IS a memory hog due to constant bugs in QUIC. I am not sure if there was ever a Kubo release that did not suffer from QUIC leaking memory in one way or other. I might be wrong but my impression is that QUIC is a permanent suspect in all discussions, release after release. As for ideas:
- Fix QUIC
- Remove MaxMemory and expose instead MaxConnections, MaxStreamsPerConnection (inbound/outbound limits can probably be derived, or exposed directly). This would at least lower expectations and be more explicit on what gets limited.