gotk4 icon indicating copy to clipboard operation
gotk4 copied to clipboard

Fix self-referencing closure memory leaks

Open diamondburned opened this issue 9 months ago • 24 comments

This commit experiments with refactoring package core/intern to use the newly added runtime.AddCleanup and weak.Pointer APIs from Go 1.24.

This commit has been tested against issues https://github.com/diamondburned/gotk4/issues/106 and https://github.com/diamondburned/gotk4/issues/126 to ensure stability. It is still not recommended to use this in production until further testing has been done, however.

diamondburned avatar Feb 22 '25 23:02 diamondburned

An easy fix for #106 and #126 is the following, I don't know if this can be solved any other way:

app := gtk.NewApplication("com.github.diamondburned.gotk4-examples.gtk4.simple", gio.ApplicationFlagsNone)
weakApp := weak.Make(app)
app.ConnectActivate(func() {
	app := weakApp.Value()
	if app == nil {
		return // activated after cleanup?
	}
	window1 := gtk.NewApplicationWindow(app)

	// ...
})

The problem is that the signal handler closes over app (you can see this in delve by stepping into ConnectActivate and looking at the function). This means that the function itself prevents the cleanup of app, but because the app is never cleaned up, the signal handler closure is never removed either, keeping the function alive, which keeps the app alive.

There are two ways to solve this with the current bindings:

  1. never close over the instance or anything that has a (strong) reference to it. This is quite hard and can be easily done accidentally, e.g. with nested structs.
  2. detach the signal handler manually with Disconnect when you are done with the object. This breaks the cycle.

There is no other way of doing this from the bindings, because you cannot know which references an object has that the closure closes over, because these references might be in the C code.

RSWilli avatar Feb 23 '25 14:02 RSWilli

This PR already works for those 2 issues without needing the user to explicitly use weak references themselves.

My main goal with keeping this a draft right now is just to encourage others to test it and report any new bugs/regressions before I go ahead and merge it.

diamondburned avatar Feb 24 '25 01:02 diamondburned

@RSWilli

There is no other way of doing this from the bindings, because you cannot know which references an object has that the closure closes over, because these references might be in the C code.

There actually is!

gotk4 accomplishes this by using Go's weak pointers and GLib's toggle references to be able to detect when to appropriately free objects for this reason. Specifically:

  1. When the object is first allocated, in most cases, the caller is assumed to be the owner, so the object is referenced from the Go side. The toggle reference callback tells us this.
    • At this point, the object is stored in Go as a weak pointer. More on this later [1].
  2. When the object is shared to the C side (e.g., a box is being displayed), C takes its own reference, and the toggle reference callback updates us.
    • At this point, the object is stored in Go as a strong pointer.
    • The object's signal callbacks are stored within the object itself rather than globally stored on its own.
  3. When a signal is emitted, gotk4 does a lookup on the associated object to find the callback to be invoked.
  4. When the object is no longer needed, C stops taking references to it. This drops the reference count back to 1 and the toggle reference callback updates us. The object will be stored once again as a weak pointer.
  5. At this point, Go will also notify us as soon as no other Go-side references are made to the object, as it is kept as a weak pointer [1]. Once this assurance has been made, we can start tearing down our reference as usual. [2]
  6. Once our reference has been torn down, GLib will start calling all the finalizers (including widget destroy signal callbacks), then free up the object.

[2]: This should answer your code review for shared.weak[obj] = weak.Make(box)!

diamondburned avatar Feb 24 '25 04:02 diamondburned

I believe GJS also uses toggle references for this reason.

diamondburned avatar Feb 24 '25 04:02 diamondburned

Also side note:

Specifically for issue #106, referencing window inside the destroy signal callback will not work. This makes sense, as it's meant to be a destruction callback, so you may as well assume that window is an invalid object by that point. It also only applies to the destroy signal itself.

diamondburned avatar Feb 24 '25 04:02 diamondburned

Go will also notify us as soon as no other Go-side references are made to the object, as it is kept as a weak pointer

How would this work in the following use case? It is using gstreamer elements, because I'm not familiar with gtk.

el := gst.ElementFactoryMake("audiotestsrc", "").(*gst.Element)

el.ConnectPadAdded(func(pad *gst.Pad) {
    el.GetObjectProperty("")
})

runtime.GC()

// potentially more references to el after connecting
el.SetObjectProperty("", "")
  1. There is only one reference on el, which is controlled by the go GC.
  2. The only reference to the func is in the bindings in the closure box.
  3. the element may be used after connecting a signal, but not guaranteed

We have some places to break the cycle here:

  1. in the element itself: reference the finalized struct with a weak pointer. This means that we either need a second reference somewhere else, or we risk premature cleanup
  2. in the box, where the closure is referenced: this would mean that the GC will clean up the function, resulting in an error when the signal is finally emitted.

The thing with this example is that there is no additional reference taken. It is just the closure that prevents the GC of the element.

IMO doing anything else with weak pointers in the bindings will not work in all cases and will potentially introduce hard to find bugs.

RSWilli avatar Feb 24 '25 10:02 RSWilli

The thing with this example is that there is no additional reference taken. It is just the closure that prevents the GC of the element.

Then what's the point of this object existing? It would have to make it over to the C side eventually, because the core logic of everything is in C, so if it's not, then what is it doing by just being on the Go side?

If el is used after the signal is bound but only on the Go side, then as soon as it is not being used anymore, Go's GC will run the finalizers needed and free up the object. This works the same as what I wrote, minus steps 2 to 5, pretty much.

diamondburned avatar Feb 24 '25 10:02 diamondburned

Then what's the point of this object existing?

The object can still create threads and do stuff on the c side. Toplevel structs like pipelines in gstreamer are owned by the user and do everything in C, but the only reference there is exists in go. That means that this will leak if we create a reference cycle, but be prematurely cleaned up if we drop the reference.

I talked to @sdroege about this and he thinks that toggle refs will not solve the problem and potentially create more issues. Quoting him:

  1. toggle refs are not composable, meaning that if gotk4 calls into code created by other bindings using toggle refs it will break. This is definitely a use case with gstreamer plugins.
  2. There are more usecases where you can accidentally create a cycle without knowing it. E.g. referencing an object that has a reference to the object where the closure is attached, e.g. with callbacks (easily reproducable with signals)

His pseudo example:

pipeline.bus().add_watch(|bus, msg| {
    pipeline.do_things();
});

The pipeline owns the bus, the bus has a watch and the watch references the pipeline -> cycle! There is no way to detect this from the runtime. Python, JS, and rust bindings also leak in this case. The user has to resort to weak pointers here.

RSWilli avatar Feb 24 '25 10:02 RSWilli

Python, JS, and rust bindings also leak in this case. The user has to resort to weak pointers here.

I think in this case, I would be happy with letting the user handle this for Go as well then. Still, I'm not willing to compromise the ease of use for all of GTK just for a handful of edge cases.

  1. toggle refs are not composable, meaning that if gotk4 calls into code created by other bindings using toggle refs it will break. This is definitely a use case with gstreamer plugins.

This is a fair point! How does GJS and other stuff handle this? They definitely rely on toggle references too, AFAICT.

diamondburned avatar Feb 24 '25 11:02 diamondburned

How does GJS and other stuff handle this?

They don't (and Python AFAIK doesn't even have a GWeakRef binding) but GTK has a hack to make this less of a problem: closing a window disposes all widgets inside it, and that gets rid of all the signal handlers and by that breaks most of the reference cycles.

sdroege avatar Feb 24 '25 11:02 sdroege

Still, I'm not willing to compromise the ease of use for all of GTK just for a handful of edge cases.

I'm not familiar with GTK, but I don't think that these cases are that rare in reality. If you never reference the object itself in the closure, then the current implementation already works fine. I'd expect this to be true for e.g. buttons that perform some action without giving user feedback (to not reference any other object in the tree). But if you want to show feedback then you'd need a handle to your app or similar, which then creates a circular reference again.

In gstreamer the trivial cases also do not produce that issue, but once you get more advanced then you will somehow react to the signal which easily creates reference cycles.

Python AFAIK doesn't even have a GWeakRef binding

The good thing is that since go 1.24 we don't even need new bindings for this, we can just pass a go weak pointer.

RSWilli avatar Feb 24 '25 12:02 RSWilli

A way to warn the user would be to inspect the closure context for anything that contains a gotk4 internal object. This is not documented at all though, I tried to look at how delve does it, and they are just casting the function to a struct to inspect the internal pointers.

But I think that once you go there you open pandora's box. Doing this at runtime could potentially introduce massive performance penalties.

RSWilli avatar Feb 24 '25 12:02 RSWilli

But if you want to show feedback then you'd need a handle to your app or similar, which then creates a circular reference again.

This case is being handled perfectly fine by this PR, with everything being GC'd as normal.

I believe the edge case here is more the "pipeline owns the bus, the bus has a watch and the watch references the pipeline -> cycle" part though. It would help me tremendously if there's like a directed graph diagram for this sort of thing...

The good thing is that since go 1.24 we don't even need new bindings for this, we can just pass a go weak pointer.

I'm not too sure, actually. Right now, gotk4 explicitly supports GWeakRef for this exact issue, but it does not use Go's weak pointer. The GWeakRef is moreso to obtain a weak pointer to a C object.

diamondburned avatar Feb 24 '25 22:02 diamondburned

I'm not too sure, actually

As shown above: https://github.com/diamondburned/gotk4/pull/161#issuecomment-2676891735 , this prevents the closure from closing over the finalized object, it instead closes over the weak pointer, which allows the underlying C object to be cleaned up.

This case is being handled perfectly fine by this PR, with everything being GC'd as normal.

I don't understand how your case is handled if my example gives you headaches. How does your solution prevent something similar to this from being a cycle?

  1. Click handler on a button that triggers an action on widget X
  2. X handles the action, for which it doesn't need any other dependency
  3. but X also has a reference to widget Y for another usecase
  4. Y disables the Button if X triggers the right action for it.

If this is all happening in go then fine, but as soon as one of those components is implemented e.g. in C then there is more than one reference taken, and a leak would be undetectable.

It would help me tremendously if there's like a directed graph diagram for this sort of thing...

That would require traversing the specific implementation of each object. This would require big changes to GObject.

RSWilli avatar Feb 25 '25 08:02 RSWilli

I don't understand how your case is handled if my example gives you headaches. How does your solution prevent something similar to this from being a cycle? [...]

Once gtk_widget_destroy() / gtk_window_destroy() is called (e.g. when closing the window), the two signal handlers are disconnected, which breaks the reference cycle in two places.

If that wasn't the case with GTK, I don't see either how this wouldn't leak.

sdroege avatar Feb 25 '25 09:02 sdroege

I don't understand how your case is handled if my example gives you headaches. How does your solution prevent something similar to this from being a cycle?

I think you should just give the PR a try. I think you and I are describing very different edge cases to the same problem, so having reproducible code (or at least a simple directed graph diagram illustrating the problem) would help me a lot in figuring out the right solution to both cases. As it stands, both #106 and #126 work with this PR, and nothing gets leaked.

diamondburned avatar Feb 26 '25 06:02 diamondburned

FWIW, I agree that this PR might not work for GStreamer specifically, and I'd be happy to draft up some kind of opt-out mechanism for GStreamer objects to not be memory-managed in such manner.

In the GTK world, there's not a lot of multi-language interop, though. It's usually just C and Go, maybe Vala.

diamondburned avatar Feb 26 '25 06:02 diamondburned

I get that it fixes the specific problems, but I find the intern package hard to reason about for several reasons: Objects get resurrected, toggle refs check for GC on the C side, etc.

I have one more question: Who holds a strong reference to the box if it gets moved to the intern weak part? From what I understand, the box holds the go closure that needs to be called in the go marshal function.

I'll experiment a bit with the changes.

RSWilli avatar Feb 26 '25 09:02 RSWilli

Who holds a strong reference to the box if it gets moved to the intern weak part?

If the box is intern-weaked, then that implies that the C side holds no reference to this box. The only other references would just be other Go code that still needs this object. Once that's no longer true, the box gets freed.

diamondburned avatar Feb 26 '25 09:02 diamondburned

Wait - the box holds the object, not the closure?

The closure is saved in a global Slab here: https://github.com/diamondburned/gotk4/blob/4/pkg/core/gbox/box.go#L88

A slab doesn't have any weak pointers - because it mustn't have any. Meaning that you always have a strong reference to to the closure.

If the closure closes over the object, then the object is strongly referenced too, meaning that it will never leave go's heap. That means that you still easily leak the object, because it is always alive on the go heap. I tried this branch with the gst bindings, and did the following:

package main

import (
	"log"
	"runtime"
	"time"

	"github.com/go-gst/go-gst/pkg/gst"
)

func mkPipeline() *gst.Pipeline {
	ret, err := gst.ParseLaunch("audiotestsrc ! decodebin name=decodebin0 ! fakesink")

	if err != nil {
		panic(err)
	}

	pipeline := ret.(*gst.Pipeline)

	el := pipeline.ByName("decodebin0").(*gst.Bin)

	h := el.ConnectPadAdded(func(newPad *gst.Pad) {
		el.SetName("foobar") // dummy method, to make the handler close over el
	})

	runtime.AddCleanup(el, func(_ struct{}) {
		log.Println("garbage collected element")
	}, struct{}{})

	// runtime.AddCleanup(pipeline, func(el *gst.Bin) {
	// 	el.HandlerDisconnect(h)
	// }, el)

	return pipeline
}

func main() {
	gst.Init()

	for range 100 {
		pipeline := mkPipeline()

		runtime.GC()

		pipeline.State(gst.ClockTimeNone)

		runtime.GC()
		runtime.GC()
		runtime.GC()
		runtime.GC()
	}

	runtime.GC()
	time.Sleep(2 * time.Second)
	runtime.GC()
}

This creates a new pipeline, retrieves an object from it, attaches a signal handler and adds a cleanup to the object, where it should log "garbage collected element". This line never happens, because of the reference cycle. el is closed by the closure and thus always strongly referenced.

There are two ways to break the cycle here:

  • either use a weak pointer to access el in the signal handler
  • or disconnect the signal handler on el when pipeline gets cleaned up (commented out above)

In both cases the log line appears. I don't think that this PR is doing what you want to @diamondburned , and I don't know if it is even possible.

RSWilli avatar Feb 26 '25 12:02 RSWilli

Wait - the box holds the object, not the closure?

The closure is saved in a global Slab here: https://github.com/diamondburned/gotk4/blob/4/pkg/core/gbox/box.go#L88

This isn't true.

The box holds both the object and the closures. See intern.Box type.

gbox is only used to save weak pointers for recalling auxiliary data.

This line never happens, because of the reference cycle. el is closed by the closure and thus always strongly referenced.

You'd want to enable log/slog with debug levels then run the program with GOTK4_DEBUG=all, then watch the allocations.

In both cases the log line appears. I don't think that this PR is doing what you want to @diamondburned , and I don't know if it is even possible.

-_-

Why are you acting like I've not done any testing myself? Have you even bothered looking into #126 and running it with this PR? I never claimed it'll work for GStreamer.

diamondburned avatar Feb 26 '25 16:02 diamondburned

You'd want to enable log/slog with debug levels then run the program with GOTK4_DEBUG=all, then watch the allocations.

Relevant log section:

2025/02/26 17:28:10 DEBUG allocating new box for object stack="goroutine 1 [running]:\nruntime/debug.Stack()\n\t/nix/store/vh5d5bj1sljdhdypy80x1ydx2jx6rv2q-go-1.24.0/share/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/diamondburned/gotk4/pkg/core/intern.newBox(0x73f658004050)\n\t/home/wbartel/projects/go-gst/gotk4/pkg/core/intern/intern.go:173 +0xb0\ngithub.com/diamondburned/gotk4/pkg/core/intern.Get(0x73f658004050, 0x0)\n\t/home/wbartel/projects/go-gst/gotk4/pkg/core/intern/intern.go:230 +0x69\ngithub.com/diamondburned/gotk4/pkg/core/glib.newObject(...)\n\t/home/wbartel/projects/go-gst/gotk4/pkg/core/glib/glib.go:637\ngithub.com/diamondburned/gotk4/pkg/core/glib.AssumeOwnership(...)\n\t/home/wbartel/projects/go-gst/gotk4/pkg/core/glib/glib.go:627\ngithub.com/go-gst/go-gst/pkg/gst.(*Bin).ByName(0xc0000e6030, {0x64a5fb, 0xa})\n\t/home/wbartel/projects/go-gst/go-gst/pkg/gst/gst.go:14143 +0x117\nmain.mkPipeline()\n\t/home/wbartel/projects/go-gst/go-gst/examples/signals/main.go:21 +0x59\nmain.main()\n\t/home/wbartel/projects/go-gst/go-gst/examples/signals/main.go:54 +0x24f\n" gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=2
2025/02/26 17:28:10 DEBUG Get: will introduce new box for object gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=2
2025/02/26 17:28:10 DEBUG Get: added toggle reference to object gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=3
2025/02/26 17:28:10 DEBUG Get: not taking, so unref'd the object gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=2
2025/02/26 17:28:10 DEBUG cleaning up object box=true gobject.ptr=0x73f658007e50 gobject.type=GstPipeline gobject.refs=1
2025/02/26 17:28:10 DEBUG makeWeak: obtained box strong=true box=true gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=1
2025/02/26 17:28:10 DEBUG goToggleNotify: toggled notify for object is_last=true finalizing=false gobject.ptr=0x73f658004050 gobject.type=GstDecodeBin gobject.refs=1
2025/02/26 17:28:10 DEBUG post-finalizer aftermath for box is_weak=true is_strong=false gobject.ptr=0x73f658007e50 gobject.type=GstPipeline gobject.refs=1

I think the reference cycle happens in the box itself? I'm not sure, but I can definitely say that the objects are not getting cleaned up and are still being referenced strongly by go.

I never claimed it'll work for GStreamer.

Sorry if it sounded rude, but what are you trying to solve here? A bug in the GTK bindings or a bug in the GLib bindings?

I have not looked into the original issue, because I don't know GTK. Using the gst bindings is an easy way for me to attach a signal handler that does not get detached by any side effects. The code above doesn't do anything else besides creating new object instances and attach some signal handlers. In fact it can be simplified even more:

package main

import (
	"log"
	"log/slog"
	"runtime"
	"time"

	"github.com/go-gst/go-gst/pkg/gst"
)

func mkObject() {
	el := gst.ElementFactoryMake("decodebin", "").(*gst.Bin)

	el.ConnectPadAdded(func(newPad *gst.Pad) {
		el.SetName("foobar")
		// log.Println("handler")
	})

	runtime.AddCleanup(el, func(_ struct{}) {
		log.Println("garbage collected element")
	}, struct{}{})

	return
}

func main() {
	gst.Init()

	slog.SetLogLoggerLevel(slog.LevelDebug)

	log.Println("go:")
	for range 5 {
		log.Println("loop")
		mkObject()

		runtime.GC()
		runtime.GC()
		runtime.GC()
	}

	runtime.GC()
	time.Sleep(2 * time.Second)
	runtime.GC()
}

RSWilli avatar Feb 26 '25 16:02 RSWilli

I think to the go GC the reference "cycle" looks as follows:

  • the intern.strong map contains the box with the c object pointer and closure.Registry. Since this is a global map everything in it is always reachable.
  • the closures.Registry contains all the FuncStacks, aka stack traces and actual closures
  • the closure in the FuncStack references the go object if the variable is used inside of it.

Thus the GC always thinks that the go object is always reachable, even if no other scope is able to call the closure. It cannot delete the map entry, because at any time the _gotk4_marshal might decide to call the closure (it wont, but the GC doesn't know this)

RSWilli avatar Feb 26 '25 16:02 RSWilli

For GTK the cycle is automatically broken if the signal handler is removed, like @sdroege said above:

Once gtk_widget_destroy() / gtk_window_destroy() is called (e.g. when closing the window), the two signal handlers are disconnected, which breaks the reference cycle in two places.

RSWilli avatar Feb 26 '25 16:02 RSWilli

I've just tested the branch with the snippet here https://github.com/diamondburned/gotk4/issues/154#issuecomment-2453547615

That's still leaking.

abenz1267 avatar Jul 13 '25 15:07 abenz1267