kubo icon indicating copy to clipboard operation
kubo copied to clipboard

feat: daemon: automatically set GOMEMLIMIT if it is unset

Open Jorropo opened this issue 3 years ago • 8 comments

I have a rather big collection of profiles where someone claims that Kubo is ooming on XGiB. Then you open the profile and it is using half of that, this is due to the default GOGC=200%. That means, go will only run the GC once it's twice as being as the previous alive set.

This situation happen more than it should / almost always because many parts of Kubo are memory garbage factories.

Adding a GOMEMLIMIT helps by trading off more and more CPU running GC more often when memory is about to run out, it's not healthy to run at the edge of the limit because the GC will continously run killing performance. So this doesn't double the effective memory usable by Kubo, but we should expect to be able to use ~1.5x~1.75x before performance drastically falling off.

Closes: #8798

Jorropo avatar Dec 03 '22 15:12 Jorropo

Would it perhaps make sense to default this to something (based on) the resource manager's limit? https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgrmaxmemory

dokterbob avatar Dec 04 '22 14:12 dokterbob

@Jorropo : I like the idea of this but in practice is this actually going to help with resource manager complaints since the go-libp2p resource manager does its own state tracking for resource usage? (I agree it will help Kubo less system resources.)

If it's not going to help with the errors/alarming, I suggest we push this out to next release since we're already trying to squeeze a lot in between now and 2022-12-08 and would rather divert our attention to other items for 0.18.

BigLep avatar Dec 06 '22 06:12 BigLep

@BigLep this does not change anything to the error logging, It helps with peoples reporting OOMs.

Jorropo avatar Dec 06 '22 06:12 Jorropo

@BigLep this does not change anything to the error logging, It helps with peoples reporting OOMs.

Also, ideally, significantly less GC disruptions on high-use nodes.

dokterbob avatar Dec 07 '22 20:12 dokterbob

2023-01-31: we pushed to next iteration because of concerns about getting caught in CPU death spiral and not actually dying. This requires extra discussion and we don't have bandwidth for it now.

BigLep avatar Jan 31 '23 17:01 BigLep

❤️

dokterbob avatar May 09 '23 11:05 dokterbob

I checked the CPU death spiral was some other bug. The Go runtime limit itself to gcing at no more than 1hz. I don't know of any issue for Kubo v0.22. The magic might be a bit dodgy, I think it's reasonable but this needs to be looked at carefully ?

Jorropo avatar Jun 29 '23 13:06 Jorropo

@Jorropo : what are the next steps here? Is this realistically going to get merged the next month?

BigLep avatar Jan 03 '24 22:01 BigLep