go icon indicating copy to clipboard operation
go copied to clipboard

proposal: weak: new package providing weak pointers

Open mknyszek opened this issue 2 months ago • 14 comments

What is a weak pointer?

Weak pointers (or weak references, as they are referred to in other languages) allow developers to reference memory without preventing the garbage collector from reclaiming it. To prevent visible dangling references, weak pointers become nil when the referenced memory is collected. Weak pointers can be converted to regular ("strong") pointers which do prevent the garbage collector from reclaiming memory, and allow typical use and access to that memory.

Weak pointers are typically a bit harder to work with than regular pointers because they can become nil at any time. Virtually every weak-to-strong conversion must be guarded by a nil check. Often, weak pointers will become nil at unexpected times.

Nonetheless, weak pointers exist in many languages because they are useful. The main use-cases for weak pointers are related to efficient memory management and reclamation. For example, efficiently managing memory for canonicalization maps, or for memory whose lifetime is bound to the lifetime of another object (akin to JavaScript's WeakMap). Another good use case for weak pointers is for hinting to the GC that it's OK to drop some resources because they're cheap to reconstruct later, especially if those resources have a large memory footprint.

Proposal

I propose adding a new weak package to the standard library with the following API.

// Pointer is a weak pointer to a value of type T.
//
// Two Pointer values compare equal if the pointers
// that they were created from compare equal. This property is retained even
// after the object referenced by the pointer used to create a weak reference
// is reclaimed.
//
// If multiple weak pointers are made to different offsets within same object
// (for example, pointers to different fields of the same struct), those pointers
// will not compare equal.
// If a weak pointer is created from an object that becomes unreachable, but is
// then resurrected due to a finalizer, that weak pointer will not compare equal
// with weak pointers created after resurrection.
//
// Calling Make with a nil pointer returns a weak pointer whose Value method
// always returns nil. The zero value of a Pointer behaves as if it was created
// by passing nil to Make and compares equal with such pointers.
type Pointer[T any] struct { ... }

// Make creates a weak pointer from a pointer to a value of type T.
func Make[T any](ptr *T) Pointer[T] { ... }

// Value returns the original pointer used to create the weak pointer.
// It returns nil if the value pointed to by the original pointer was reclaimed by
// the garbage collector.
// If a weak pointer points to an object with a finalizer, then Value will
// return nil as soon as the object's finalizer is queued for execution.
func (p Pointer[T]) Value() *T { ... }

Design discussion

Representation

The weak pointer we propose would be represented via an indirection object under the hood. This means that each pointer that has any weak pointers associated with it also has an associated 8 byte allocation that contains a single pointer. This pointer is the actual weak reference. This representation is very CPU efficient, since the GC can atomically set a single pointer to make all weak references to an object nil.

This 8 byte allocation is rarely a problem in practice since weak pointers are most often useful for memory efficiency anyway. For example, canonicalization maps are deduplicating memory already, so the extra 8 bytes per canonical copy is relatively cheap. However, it may be worth calling out this detail in the documentation as the C# language does.

This representation has other benefits. One benefit is that the API's equality semantics fall out very naturally. Weak pointers created from the same pointer remain equal even after that pointer no longer exists. This makes it possible to build maps keyed by weak pointers. Another benefit is its simplicity: very little of the GC has to change to support weak pointers.

Avoiding object resurrection

A subtle but important property of the weak pointer implementation is that it must not enable object resurrection. This would create many complications, including the fact that weakly-referenced objects would take 2 GC cycles to reclaim; and they could not be involved in cycles, or they would never be reclaimed (much like finalizers).

A simple way to implement weak references that maintains this property is to guard weak-to-strong conversions with ensuring the object is swept. A swept object cannot be resurrected, because it is always definitely reachable or unreachable, nothing in between. This means the weak-to-strong conversion path may need to sweep the object, but this happens at most once per GC cycle, and the Go runtime's sweeper is quite fast, so the cost shouldn't be noticeable in practice.

Why this specific interaction with finalizers?

The interaction between weak pointers and finalizers that resurrect objects is subtle. There are two choices: the weak pointer can go nil before the finalizer runs, or it can track the object's resurrection, and only go nil once the object is well and truly dead. In C# jargon, this is the difference between "short" weak reference and a "long" weak reference.

At first glance, the latter seems clearer since the weak pointer really only goes nil once nothing can reference the object. However, this semantics has a big problem, and it's subtle.

Today, the decision to resurrect an object in a finalizer is internal to an API’s implementation. If we allowed weak-to-strong conversions after finalization, then clients of an API could resurrect an object without the cooperation of its implementation, at which point the object's internals may be in an inconsistent state. This creates the potential for a race that is not immediately apparent to either the API authors or the client. Worse still, these races will be very rare and unlikely to be caught by testing, even with the race detector enabled. If the weak pointer goes nil before the finalizer runs, then such races are impossible.

It is telling that other garbage collected languages either make "short" weak references the default and do not recommend their "long" form (C#), or only offer the "short" form to begin with (Java).

Risks

Breaking the GC abstraction

The unique.Handle proposal explicitly rejected weak pointers as an alternative on the grounds that they are difficult to use. Fundamentally, this is because they break the GC abstraction by revealing the timing with which memory is actually reclaimed, and this is mainly a problem because that timing is unpredictable. The GC abstraction of "infinite memory" (only allocation, no apparent free) is important in general for composability.

While it is true that weak pointers are more difficult to use than regular pointers, we believe that we have picked an API (building on the experiences of other languages) that minimizes surprises, maintains composability, and has a simple implementation. This was discovered during the implementation of unique.Handle and it was compelling enough that it led to this proposal.

Although weak pointers will be misused in some cases, providing documentation, examples, and advice in the GC guide will go a long way to mitigating that.

mknyszek avatar May 21 '24 19:05 mknyszek