google-cloud-go
google-cloud-go copied to clipboard
all: consider a way to automatically Close clients
Users have to remember to Close
clients when they're done using them. Otherwise, you can end up with a memory leak. Is there a way we can automatically Close
clients?
@rsc suggested using finalizers. We need to take care to use them correctly and not cause more problems (either with leaks or doing "work" that shouldn't be done automatically).
One way to do this is to take the existing
Client
implementation and rename it, say,internalClient
. Then make a newtype Client struct { i *internalClient }
with all the documented wrappers it needs:
// M does whatever it has always done. func (c *Client) M() *T { r := c.i.M() runtime.KeepAlive(c) // delay finalizer until M is done return r }
(Note that you cannot use embedding for this because the methods will become value methods, which would be a confusing change. Also because of the need for
runtime.KeepAlive
; more below.)Then when you make a
Client
you set a finalizer on it that callsc.i.Close
.func NewClient() *Client { c := &Client{newInternalClient()} runtime.SetFinalizer(c, func(c *Client) { c.Close() }) } func (c *Client) Close() error { runtime.SetFinalizer(c, nil) return c.i.Close() }
This way, if the user does call
Close
, great, all is well. But if not, then it gets closed once the GC notices thatClient
is no longer used, theinternalClient
gets closed properly.This does require that each method wrapper like
M
above useruntime.KeepAlive
. Otherwise there is a race where during the call toc.i.M
(assuming that takes a while), the garbage collector might notice thatc
is never again used in this program and call the finalizer, causing ac.i.Close
to happen whilec.i.M
is still running. The same is true for any data structure passed back to the user that refers toc
implicitly.This can be tricky to get right, but we know where the tricks are at this point, and it would fix our users' code 100% of the time instead of pointing out only a subset of misuse. (False negatives for a checker like this are fundamental. False positives are not, but they're common too. Making everything just work all the time is probably nicer.)
cc @bcmills @codyoss @noahdietz @rsc
Sounds like a great idea. We are already half way there - we have a wrapper Client type with the transport-specific internalClient
interface as a field.
The integration tests in the client generator do make use of a copy of leakcheck
(which was copied from here, copied from grpc-go π ), which I think will aid in our development of this/getting it right.
One question: if we roll this out, is there a baked-in break glass API for users to disable our use of finalizers? Or do we need to consider adding something to toggle this off?
(I'm going to repeat here what I said on the thread with Russ.)
I'm really not a fan of implicitly closing things using finalizers as is done for os.File
. That makes it much too easy to hide a (bounded) resource leak that the user did not intend, and makes those leaks much harder to notice and track down, because they may well evaporate due to changes in GC pacing when you try to isolate them in a test.
Finalizers work better for verifying invariants than for restoring them. For example, here you could have the finalizer Close the client when run in non-race mode (to avoid OOMs in production), but terminate the process if the client is unclosed in race mode (to detect leaks during testing):
func NewClient() *Client {
c := &Client{newInternalClient()}
if raceMode {
c.addAllocationStackTrace()
}
runtime.SetFinalizer(c, (*Client).finalize)
return c
}
func (c *Client) Close() error {
runtime.SetFinalizer(c, nil)
return β¦
}
func (c *Client) finalize() {
if raceMode {
c.dumpAllocationStackTrace()
panic(fmt.Sprintf("%T became unreachable without a call to Close", c))
} else {
β― // Maybe log a warning or increment a metric here?
c.i.Close()
}
}
That provides the same benefits in production as just invoking Close
in the finalizer, but has the added benefit of also reporting the leak during (race-mode) testing, where users are more likely to notice it. (You could perhaps apply some additional hacks to enable leak detection in other tests, too β for example, you could have any public-facing test helper libraries also activate leak diagnostics at import time.)
That would also avoid the need for (fragile) use of runtime.KeepAlive
through the client implementations: rather than trying to patch up the Close
/ use race by sprinkling keepalives all over the code like glitter (β¨), it would define that race as an error and diagnose it (like any other racy operation!) when run under the race detector.
To highlight one specific ill effect from closing in finalizers: finalizers are run by the garbage collector, and the garbage collector paces its own progress (and thus the rate of finalization) in proportion to memory usage (controlled by the GOGC
environment variable).
However, the resources consumed by a client are not necessarily only Go memory: clients may also need network sockets, file descriptors, or other similar handles to kernel-managed resources. It is possible for a Go program to run out of those resources well before it runs out of memory proper, in which case the program will start to fail. This failure mode will typically happen under load β so users may be less likely to test for it adequately β and is not usually reflected in benchmarks (which typically measure memory and CPU resources, not network sockets and file descriptors).
So I believe that a solution that encourages users to rely on finalizers to close clients will lead to user programs that are more likely to fail under moderate to high loads.
IMHO a specific linter is a very good option without touching original code, like https://github.com/timakin/bodyclose does for (*net/http).Response.Close
for example.
I think we could implement this at the pool level: the pool could drain connections if it detects it has been left unused.
Every time a client makes a gRPC call, Conn()
is called.
https://github.com/googleapis/google-api-go-client/blob/v0.52.0/transport/grpc/pool.go#L41
An IMHO straight forward way is to introduce a package level function, e.g. With(f) error
taking a function f as parameter. With
is implemented as
- allocate a client
c
- defer
c.Close()
- call the function
return f(c)
- add error handling for close and handle panics as needed
A client uses it as
_ = package.With(func(c) {return doStuff(c, β¦)})
Itβs a bit more complicated, if c
can be customized on creation only, but the existing library already handles this already, With
must then replicate this initialization the same way.
pubsublite is now supporting this finalizer behavior: https://github.com/googleapis/google-cloud-go/pull/7109
Based on some recommendations from the Go language team I don't think we will move forward with auto-closing via finalizers. I think this is a docs problem which we have done audits in the past to make sure our samples all show closing clients. More recently I have also noticed auto-completes (AI based) recommend closing clients as well. If a different strategy comes up in the future on how to handle closing clients we can consider it but we will not be moving forward with a general purpose finalizer approach.