spdystream icon indicating copy to clipboard operation
spdystream copied to clipboard

What's the purpose of the 10 minute wait?

Open ryshoooo opened this issue 2 years ago • 2 comments

In the microservice I'm building, I've experienced a memory leak using the Kubernetes client-go package, which I traced back to https://github.com/moby/spdystream/blob/master/connection.go#L733.

The memory leak is really well documented in https://github.com/kubernetes/kubernetes/issues/105830.

Essentially what happens is that running the stream executors with in-cluster kubeconfig context leads to many broken pipes, which leads to the waiting time of 10 minutes before the shutdown goroutine ends. This waiting time keeps the data in memory alive. Not a big deal if it's a small number of broken pipes, but in my case the number is rising very quickly and easily can reach 2-3 GB of allocated memory within the 10-minute hold, thus considering it a memory leak of high significance (essentially a system-service pod is taking memory that the users of the cluster can't use).

The fix for this is that I replaced the waiting time from 10 minutes to 10 milliseconds instead, and the memory leak is "gone"(not truly gone, just very low and GCed quickly). However, I wonder what are the repercussions of changing https://github.com/moby/spdystream/blob/master/connection.go#L733 to milliseconds? Also, what is the point of waiting here in the first place?

ryshoooo avatar Jul 06 '23 07:07 ryshoooo

Getting same problem. @ryshoooo How did you solved this problem? did you forked this and customized?

134130 avatar Apr 23 '24 10:04 134130

Yeah I forked it and removed the wait :) which helped a lot... here's my fork https://github.com/ryshoooo/spdystream

and I just added this line into go.mod replace github.com/moby/spdystream => github.com/ryshoooo/spdystream v0.0.0-20230706002604-5d891ea12436

ryshoooo avatar Apr 23 '24 20:04 ryshoooo

I opened a partial fix for this in https://github.com/moby/spdystream/pull/99

If the error is not handled / consumed from the shutdownChan, there's still a 10 minute timeout, but if the error is consumed (as expected), there is no background goroutine spawned / leaking

liggitt avatar Jun 27 '24 16:06 liggitt

I think this can be closed now

aojea avatar Jun 28 '24 06:06 aojea

thanks @aojea @liggitt

dims avatar Jun 28 '24 10:06 dims

https://github.com/moby/spdystream/pull/99 only avoided the 10 minute wait when the close error is handled / consumed... I think there's more to do to improve the case where it is not explicitly handled / consumed ... 10 minutes is a long time to leave the shutdown call hung, I think

liggitt avatar Jun 28 '24 12:06 liggitt

yes, I've thought that, but on k8s, we are trying to change the protocol based on http2, that's the reason that I've thought it is not important.

134130 avatar Jun 28 '24 13:06 134130

opened a follow-up in https://github.com/moby/spdystream/pull/101 to shorten the timeout for the unhandled error case

liggitt avatar Jun 28 '24 16:06 liggitt