mmtk-core
mmtk-core copied to clipboard
Remove NULL ObjectReference
This is the first attempt to use the MEP process for changing a fundamental part of MMTk.
TL;DR
Currently MMTk assumes ObjectReference
can be either a pointer to an object or NULL, which is not general for all VMs, especially the VMs that can store tagged values in slots. Meanwhile, MMTk core never processes NULL references. We propose removing ObjectReference::NULL
so that ObjectReference
is always a valid object reference.
Goal
- Remove
ObjectReference::NULL
so thatObjectReference
always refers to an object.
Non-Goal
- It is not a goal to enforce
Address
to be a non-zero address. If we need, we can add a new typeNonZeroAddress
separately. - It is not a goal to skip object graph edges pointing to singleton objects that represents NULL-like values (such as
nothing
andmissing
in Julia,None
in CPython, andnull
andundefined
in V8) during tracing. I have opened a separate issue for it: https://github.com/mmtk/mmtk-core/issues/1076
Success Metric
- No observable performance impact.
- Remove all invocations (including assertions) of
object.is_null()
from mmtk-core. - Existing MMTk-VM APIs involving
ObjectReference::NULL
can still work, by usingNone
or using other designs.
Motivation
Status-quo: All ObjectReference
instances refer to objects, except ObjectReference::NULL
.
Currently, ObjectReference::NULL
is defined as ObjectReference(0)
, and is used ot represent NULL pointers. However, it
- is not general enough,
- pollutes the API design,
- is prone to missing & redundant NULL checks, and
- encourages non-idiomatic Rust code.
NULL and 0 are not general enough
Not all languages have NULL references. Haskell, for example, is a functional language and all varaibles are initialized before using.
For some VMs (such as CRuby, V8 and Lua), a slot may hold non-reference values. Ruby and V8 can put small integers in slots. Ruby can also put special values such as true
, false
and nil
in slots.
Even if a language has NULL references of some sort, they are not always encoded the same way. Some VMs (such as V8 and Julia) even have different flavors of NULL or "missing value" types.
Language/VM | Thing | Representation | Note |
---|---|---|---|
OpenJDK | null |
0 | |
JikesRVM | null |
0 | |
CRuby | nil |
4 | false is represented as 0 |
V8 | null |
ptr | Pointer to a singleton object of the Oddball type |
V8 | undefined |
ptr | Pointer to a singleton object of the Oddball type |
Julia | nothing |
ptr (jl_nothing ) |
Pointer to a singleton object of the Nothing type |
Julia | missing |
ptr | Pointer to a singleton object of the struct Missing type, defined in Julia |
CPython | None |
ptr (Py_None ) |
Pointer to a singleton object of NoneType |
CRuby encodes nil
as 4 instead of 0. Python uses a valid reference to a singleton object None
to represent missing values.
Some languages have multiple representations of non-existing values. JavaScript has both null
and undefined
. Julia has both nothing
and missing
.
For reasons listed above, a single constant ObjectReference::NULL
with numerical value 0
is not general at all to cover the cases of missing references or special non-reference values in languages and VMs.
NULL pollutes the API design.
Previously designed for Java, MMTk assumes that
- a slot may hold a NULL pointer, and
- NULL is represented as 0.
This has various influences on the API design and the internal implementation of MMTk-core.
Processing slots (edges)
This issue is discussed in greater detail in https://github.com/mmtk/mmtk-core/issues/1031. It has been fixed in https://github.com/mmtk/mmtk-core/pull/1032. Before it was fixed, the method ProcessEdgesWork::process_edge
behaved like this:
// Outdated code from ProcessEdgesWork::process_edge
let object = slot.load();
let new_object = self.trace_object(object);
slot.store(object);
In these three lines,
-
slot.load()
loads from the slot verbatim, interpreting 0 asObjectReference::NULL
. -
trace_object
handlesNULL
"gracefully" by returningNULL
, too. -
slot.store(object)
may overwritesNULL
withNULL
, which was supposed to be "benign".
Such assumptions breaks if (1) the VM does not use 0 to encode NULL, or (2) the VM can hold tagged non-reference values in slots. CRuby is affected by both.
PR https://github.com/mmtk/mmtk-core/pull/1032 fixes this problem by allowing slot.load()
to return ObjectReference::NULL
even if nil
is encoded as 4, or if the slot holds small integers, and process_edge
simply skip such slots. It is now general enough to support V8, Julia and CRuby. However, the use of ObjectReference::NULL
to represent skipped fields is not idiomatic in Rust. We should use Option<ObjectReference>
instead.
ReferenceProcessor
Note: In the future we may move ReferenceProcessor
and ReferenceGlue
out of mmtk-core. See: https://github.com/mmtk/mmtk-core/issues/694
ReferenceProcessor
is designed for Java, and a Java Reference
(soft/weak/phantom ref) can be cleared by setting the referent to null
. The default implementation of ReferenceGlue
works this way. ReferenceGlue::clear_referent
sets the referent to ObjectReference::NULL
, and ReferenceProcessor
checks if a Reference
is cleared by calling referent.is_null()
.
It works for Java. But not Julia because Julia uses a pointer jl_nothing
to represent cleared references. Although ReferenceGlue::clear_referent
can be overridden, it was not enough. Commit https://github.com/mmtk/mmtk-core/commit/9648aed62621f33026f1807573d707965c3a88fe added ReferenceGlue::is_referent_cleared
so that ReferenceProcessor
can compare the referent against jl_nothing
instead of ObjectReference::NULL
.
p.s. ReferenceGlue::clear_referent
is the only place in mmtk-core (besides tests) that uses the constant ObjectReference::NULL
. This means the major part of mmtk-core does not work with NULL
references from the VM.
NULL-checking is hard to do right
ObjectReference
can be NULL
, and the type system cannot tell if a value of type ObjectReference
is NULL
or not. As a consequence, programmers have to insert NULL-checking statements everywhere. It's very easy to miss necessary checks and add redundant checks.
Missing NULL checks
In the reference processor, the following lines load an ObjectReference
from a weak reference object, and try to get its forwarded address.
// Outdated code from ReferenceProcessor::forward
let old_referent = <E::VM as VMBinding>::VMReferenceGlue::get_referent(reference); // Is `old_referent` cleared?
let new_referent = ReferenceProcessor::get_forwarded_referent(trace, old_referent);
<E::VM as VMBinding>::VMReferenceGlue::set_referent(reference, new_referent);
The code snippet calls get_forwarded_referent
regardless whether old_referent
has been cleared or not. Because get_forwarded_referent
calls trace_object
and trace_object
used to return NULL
if passed NULL
, the code used to be benign for Java. However, the code will not work if the VM does not use 0 to encode a null reference, or the slot can hold tagged non-reference values, for reasons we discussed before. Since the only VM that overrides ReferenceGlue::is_referent_cleared
(Julia) does not use MarkCompact, this bug went undetected.
This bug has ben fixed in https://github.com/mmtk/mmtk-core/pull/1032, but it shows that how hard it is to manually check for NULL
in all possible places.
Unnecessary NULL checks
Inside MMTk core, the most problematic functions are the trace_object
methods of various spaces.
-
trace_object
: Some spaces checkobject.is_null()
intrace_object
and returnNULL
if it is null. But it is unnecessary because after SFT orPlanTraceObject
dispatches thetrace_object
call to a concrete space by the address ofObjectReference
, it is guaranteed not to be NULL.
Some API functions check for is_null()
because we defined ObjectReference
as NULL-able. Those API functions don't make sense for NULL pointers.
-
is_in_mmtk_space(object)
: It checks if the argument is NULL only because theObjectReference
type is NULL-able. Any VMs that use this API function to distinguish references of MMTk objects from pointers frommalloc
, etc., will certainly check NULL first before doing anything else. -
ObjectReference::is_reachable()
: It checksis_null()
before using SFT to dispatch the call. IfObjectReference
is not NULL-able in the first place, the NULL check will be unnecessary.
NULL encourages non-idiomatic Rust code
In Rust, the idiomatic way to represent the absence of a value is None
(of type Option<T>
). However, ObjectReference::NULL
is sometimes used to represent the absence of ObjectReference
.
In MarkCompactSpace: Our current MarkCompact implementation stores a forwarding pointer in front of each object for forwarding. When the forwarding pointer is not set, that slot holds a ObjectReference::NULL
(value 0). But what it really means is that "there is no forwarded object reference associated with the object".
In Edge::load()
: As we discussed before, since https://github.com/mmtk/mmtk-core/pull/1032, Edge::load()
now returns ObjectReference::NULL
, it means "the slot is not holding an object reference" even if the slot is holding a tagged non-reference value or a null reference not encoded as numerical 0. In idiomatic Rust, the return type of Edge::load()
should be Option<ObjectReference>
and it should return None
if it is not holding an object reference. We are currently not using Option<ObjectReference>
as the return type because the ObjectReference
is currently backed by usize
and can be 0. Consequently, Option<ObjectReference>
has to be larger than a word, and will have additional overhead.
Description
We propose removing the constant ObjectReference::NULL
, and make ObjectReference
non-NULL-able.
Making ObjectReference non-zero
For performance concerns, we shall change the underlying type of ObjectReference
from usize
to std::num::NonZeroUsize
.
#[repr(transparent)]
pub struct ObjectReference(NonZeroUsize);
And there is another good reason for forbidding 0, because no objects can be allocated at or near the address 0. (That assumes ObjectReference
is an address. See https://github.com/mmtk/mmtk-core/issues/1044)
By doing this, Option<ObjectReference>
will have the same size as usize
due to null pointer optimization. Passing Option<ObjectReference>
between functions (including FFI boundary) should have no overhead compared to passing ObjectReference
directly.
An ObjectReference
can be converted from Address
in two ways.
impl ObjectReference {
// We had this method before, but it now returns `Option<ObjectReference>`.
pub fn from_raw_address(addr: Address) -> Option<ObjectReference> {
NonZeroUsize::new(addr.0).map(ObjectReference)
}
// This is new. It assumes `addr` cannot be zero, therefore it is `unsafe`.
pub unsafe fn from_raw_address_unchecked(addr: Address) -> ObjectReference {
debug_assert!(!addr.is_zero());
ObjectReference(NonZeroUsize::new_unchecked(addr.0))
}
}
Refactoring the Edge
trait
The Edge
trait will be modified so that
-
Edge::load()
now returnsOption<ObjectReference>
. If a slot does not hold an object reference (null
,nil
,true
,false
, small integers, etc.), it shall returnNone
. -
Edge::store(object: ObjectReference)
still takes anObjectReference
as parameter because we can only forward valid references.
Refactoring the reference processor
Note: Ultimately ReferenceGlue
and ReferenceProcessor
will be moved outside mmtk-core. Here we describe a small-scale refactoring for this MEP.
The ReferenceGlue
and ReferenceProcessor
will be modified so that
-
ReferenceGlue::get_referent
now returnsOption<ObjectReference>
. It returnsNone
if the reference is already cleared. -
ReferenceGlue::is_referent_cleared
will be removed. -
ReferenceGlue::clear_referent
will no longer have a default implementation because mmtk-core no longer assumes the reference object represents "the referent is cleared" by assigning 0 to the referent field. -
ReferenceProcessor
will no longer callis_referent_cleared
, but will check ifget_referent
returnsNone
orSome(referent)
.
ReferenceProcessor
also contains many assertions to ensure references are not NULL. Those can be removed.
Removing unnecessary NULL checks
The PR https://github.com/mmtk/mmtk-core/pull/1032 already removed the NULL
checks related to trace_object
.
Public API functions is_in_mmtk_space
and ObjectReference::is_reachable
will no longer do NULL checks because ObjectReference
cannot be NULL in the first place.
The forwarding pointer in MarkCompact
Instead of loading the forwarding pointer as ObjectReference
directly, we load the forwarding pointer as an address, and convert it to Option<ObjectReference>
. The convertion itself is a no-op.
fn get_header_forwarding_pointer(object: ObjectReference) -> Option<ObjectReference> {
let addr = unsafe { Self::header_forwarding_pointer_address(object).load::<Address>() };
ObjectReference::from_raw_address(addr)
}
MarkCompactSpace::compact()
calls get_header_forwarding_pointer(obj)
. It always needs to check if obj
has forwarding pointer because obj
may be dead, and dead objects don't have forwarding pointers (i.e. get_header_forwarding_pointer(obj)
returns None
if obj
is dead). It used to check with forwarding_pointer.is_null()
.
Write barrier
Main issue: https://github.com/mmtk/mmtk-core/issues/1038
The barrier function Barrier::object_reference_write
takes ObjectReference
as parameters:
fn object_reference_write(
&mut self,
src: ObjectReference,
slot: VM::VMEdge,
target: ObjectReference,
) {
self.object_reference_write_pre(src, slot, target);
slot.store(target);
self.object_reference_write_post(src, slot, target);
}
Here target
is NULL-able because a user program may execute src.slot = null
. (More generally, a JS program may have src.slot = "str"; src.slot = 42;
, overwriting a reference with a number.) The type of target
can be changed to Option<ObjectReference>
. However, the main problem is that slot.store()
no longer accept NULL pointers. The root problem is the design of Barrier::object_reference_write
, and that needs to be addressed separately. See https://github.com/mmtk/mmtk-core/issues/1038
The object_reference_write_pre
and object_reference_write_post
methods should still work after changing target
to Option<ObjectReference>
. The "pre" and "post" functions do not modify the slot.
For now, we may keep Barrier::object_reference_write
as is, but it will not be applicable if target
is NULL. Currently no officially supported bindings use Barrier::object_reference_write
. Other bindings should call object_reference_write_pre
and object_reference_write_post
separately and manually stores the new value to the store before https://github.com/mmtk/mmtk-core/issues/1038 is properly addressed.
Impact on Performance
This MEP should have no visible impact on performance. Preliminary performance evaluation supports this: https://github.com/mmtk/mmtk-core/pull/1064
Because of null pointer optimization, Option<ObjectReference>
, ObjectReference
, Option<NonZeroUsize>
, NonZeroUsize
and usize
all have the same layout.
When converting from Address
to ObjectReference
, neither ObjectReference::from_raw_address
(returns Option<ObjectReference>
) nor ObjectReference::from_raw_address_unchecked
(returns ObjectReference
directly) have overhead. But when unwrapping the Option<ObjectReference>
, it will involve a run-time check.
The overhead of the None
check (pattern matching or opt_objref.unwrap()
) should be very small. But if the zero check is a performance bottleneck, we can always use ObjectReference::from_raw_address_unchecked
as a fall-back, provided that we know it can't be zero.
There are three known use cases of Option<ObjectReference>
in mmtk-core:
-
slot.load()
returnsNone
if a slot doesn't hold a reference, -
ReferenceGlue::get_referent()
returnsNone
if a (weak)Reference
is cleared, and - the forwarding pointers in MarkCompact.
In all those cases, the checks for None
are necessary for correctness. Previously, those places check against ObjectReference::NULL
.
Impact on Software Engineering
mmtk-core
With ObjectReference
guaranteed to be non-NULL, Option<ObjectReference>
can be used to indicate an ObjectReference
may not exist. As discussed above, typical use cases of Option<ObjectReference>
are (1) slot.load()
, (2) ReferenceGlue::get_referent()
and (3) the forwarding pointer in MarkCompact. The use of Option<T>
forces a check to convert Option<ObjectReference>
to ObjectReference
. By doing this, we can avoid bugs related to missing or redundant NULL checks.
Bindings
Some code needs to be changed in the OpenJDK binding due to this API change. The OpenJDK binding uses struct OpenJDKEdge
(which implements trait Edge
) to represent a slot in OpenJDK. Because trait Edge
is designed from the perspective of mmtk-core, the Edge
trait itself does not support storing NULL
into the slot. I have to add an OpenJDK-specific method OpenJDKEdge::store_null()
to store null
to the slot in an OpenJDK-specific way. This is actually expected because not all VMs have null
pointers, nor do they encode null
, nil
, nothing
, etc. in the same way. OpenJDKEdge::store_null()
also bypasses some bit operations related to compressed OOPs. This change added compexity to the OpenJDK binding, but I think it is the right way to do it.
Another quirk in software engineering is that we sometimes have to call unsafe { ObjectReference::from_raw_address_unchecked(addr) }
to bypass the check against zero because we (as humans) are sure addr
is never zero. That happens when:
- When we construct an
ObjectReference
from the result ofalloc
oralloc_copy
. We know newly allocated objects cannot have zero as their addresses, but the Rust language cannot figure it out unless we addNonZeroAddress
, too.- Note that when calling
alloc
, MMTk may find it is out of memory. Currently, the behavior is, MMTk core will callCollection::out_of_memory
, and thenalloc
will returnAddress(0)
to the caller. But the default implementation ofCollection::out_of_memory
is panicking, so the binding may assumealloc
never returnsAddress(0)
on normal returns. But if the binding overridesCollection::out_of_memory
, it will need to actually check if the return value ofalloc
is 0 instead of using the unsafefrom_raw_address_unchecked
function.
- Note that when calling
- In the OpenJDK binding, when we decode a compressed pointer, we now have to check the compressed OOP against zero manually and call
unsafe { ObjectReference::from_raw_address_unchecked(BASE.load(Ordering::Relaxed) + ((v as usize) << SHIFT.load(Ordering::Relaxed))) }
, too. The Rust langauge cannot prove that the result can't be zero ifv
is not zero, but we as humans know the check against zero is unnecessary.
The presence of unsafe { ... }
makes the code look unsafe, but it is actually as safe (or as unsafe) as before.
Risks
Long Term Performance Risks
Converting Address
to ObjectReference
has overhead only if we don't know whether the address can be zero or not. (We can always use unsafe { ObjectReference::from_raw_address_unchecked(addr) }
if we know addr
cannot be zero.)
This will remain true in the future. If we don't know if it is zero at compile time, then run-time checking will be necessary, and this MEP enforces the check to be done. Such overhead should always exist regardless whether we allow ObjectReference
to be NULL
or not (and the overhead may be erroneously omitted if we fail to add a necessary NULL check).
Long Term Software Engineering Risks
Option<ObjectReference>
across FFI boundary
One potential problem is the convenience of exposing Option<ObjectReference>
to C code via FFI. Ideally, C programs should use uintptr_t
for Option<NonZeroUsize>
, with 0 representing None
. However, Rust currently does not define the layout of Option<NonZeroUsize>
. Even though the only possible encoding of None
(of type Option<NonZeroUsize>
) is 0usize
, the Rust reference still states that transmuting None
(of type Option<NonZeroUsize>
) to usize
has undefined behavior. So we have to manually write code to do the conversion, mapping None
to 0usize
. Despite that, the conversion functions should be easy to implement. We can implement two functions to make the conversion easy:
let word: usize = ffi_utils::objref_to_usize_zeroable(object);
let object: ObjectReference = ffi_utils::usize_to_objref_zeroable(word);
That should be concise enough for most use cases.
Currently, very few public API functions exposes the Option<ObjectReference>
type. They are:
-
ObjectReference::get_forwarded_referent(self) -> Option<Self>
-
vo_bit::is_vo_bit_set_for_addr(address: Address) -> Option<ObjectReference>
: Although public, VM bindings tend to useis_mmtk_object
instead.
With this MEP implemented,
-
Edge::load() -> Option<ObjectReference>
will be a new use case.
The software engineering burden should be reasonable for those three API functions. Specifically, the OpenJDK binding currently does not use get_forwarded_referent
nor is_vo_bit_set_for_addr
, and Edge::load()
is trivial to refactor.
If, in the future, the mmtk-core introduces more API functions that involve Option<ObjectReference>
(which I don't think is likely to happen), we (or the VM bindings) may introduce macros to automate the conversion.
VM Binding considerations
VM bindings can no longer use the ObjectReference
type from mmtk-core to represent its reference types if the VM allows NULL references. Binding writers may find it inconvenient because they need to define their own null-able reference types. But existing VMs already have related types. The OpenJDK binding already has the oop
type, and we know it may be encoded as u32
or usize
depending on whether CompressedOOP is enabled. The Ruby binding has the VALUE
type which is backed by unsigned long
and can encode tagged union.
I don't worry about new bindings because if the developer knows a ObjectReference
must refer to an object and cannot be NULL or small integers, they will roll their own nullable or tagged reference type and get things right from the start. The problem may be with existing bindings (OpenJDK, Ruby, Julia and V8). If they assumed ObjectReference
may be NULL or may hold tagged references, they need to be refactored.
Impact on Public API
The most obvious change is the Edge
trait. Edge::load()
will return Option<ObjectReference>
, and Edge::store(object)
will ensure the argument is not NULL. As stated above, OpenJDKEdge::load()
has been trivially refactored to adapt to this change.
Other public API functions will no longer accept NULL ObjectReference
, but most public API functions never accepted NULL as argument before.
The main problem is object_reference_write
and its _pre
, _post
and _slow
variants. As we discussed in the Write barrier section, object_reference_write
will stop working for VMs that support null pointers or tagged pointers because we can no longer store NULL
to an Edge
. However, VMs are still able to use write barriers by calling the _pre
and _post
functions separately, or inlining the fast path and calling the _slow
function separately.
Currently,
- mmtk-openjdk always calls
_pre
and_post
functions separately. - mmtk-jikesrvm and mmtk-ruby do not support any generational GC, yet.
- mmtk-julia only calls the
_post
and the_slow
functions.
Since currently no officially supported VM bindings use object_reference_write
directly, there is no immediate impact.
But in the long term, we should redesign the write barrier functions to make them more general. See: https://github.com/mmtk/mmtk-core/issues/1038
Testing
We may add unit tests to ensure
-
Option<ObjectReference>
,ObjectReference
,NonZeroUsize
andusize
all have the same size. - The conversion between
Address
to/fromOption<ObjectReference>
properly handlesAddress(0)
andNone
.
And we should add micro benchmarks to ensure
- Conversion between
Address
andOption<ObjectReference>
should have no performance penalty. - Unsafe conversion from
Address
toObjectReference
should have no performance penalty. - Converting
Some(ObjectReference)
toObjectReference
(via matching) should be efficient. - Unwrapping an
Option<ObjectReference>
should be efficient.
It is better if we can verify the generated assembly code of the "no penalty" cases to make sure they are no-op.
No tests need to be added around trace_object
implementations because the Rust language will ensure the underlying NonZeroUsize
will never hold the value 0.
Currently one test involves ObjectReference::NULL
, that is, the test for is_in_mmtk_space
. It tests if the function returns false
when the argument is ObjectReference::NULL
. We may remove that test case because we removed ObjectReference::NULL
.
Alternatives
We may do nothing, keeping ObjectReference::NULL
and use it to represent a missing ObjectReference
. MMTk is still capable of performing GC and supporting our current supported VMs wihtout this refactoring. But the problem of this approach has been listed in the Motivation section, namely not general enough, polluting the API, hard to get NULL checks right, and non-idiomatic in Rust.
We may do the opposite, i.e. allowing ObjectReference
to represent not only NULL
encoded as 0, but also language-specific NULL variants such as nil
, nothing
, missing
, undefined
, etc., and allow the binding to define the possible NULL-like values. But if we take this approach, MMTk core will not only have to check for NULL
everywhere, but also need to check for other special NULL-like values everywhere, too, making software engineering more difficult.
Assumptions
Currently ObjectReference
is backed by usize
, and all existing VM bindings implement ObjectReference
as a pointer to an object, or to some offset from the start of an object. While this design (implementing ObjectReference
as a pointer to object, possibly with an offset) is able to support fat pointers, offsetted pointers, and handles, we acknowledge that it may not be the only possible design. For example, we currently assume that ObjectReference
can only represent references, but not non-reference values such as NULL, small integers, true
, false
, nil
, undefined
, etc.
If, in the future, we change the definition so that ObjectReference
can also hold NULL
, nil
, true
, false
, small integers, etc., we will need to think about this MEP again. I (Kunshan) personally strongly disagree with the idea of letting ObjectReference
hold a tagged non-reference value, such as small integer. If ObjectReference
can be nil
, true
, false
, and small integers, then mmtk-core will need to check whether a given ObjectReference
is such special non-ref values everywhere, which is even worse than adding NULL checks everywhere.
MMTk core makes no assumption about how an object reference is stored in a slot. The VM (such as OpenJDK) may store compressed pointers in some slots. That is abstracted out by the Edge::load()
method which decompresses the pointer and returns a Some(ObjectReference)
or None
. If the VM finds the slot is holding a NULL reference after decoding (or before decoding if 0u32
also represents NULL, as in OpenJDK), it still returns None
.
Related Issues
Preliminary implementation PRs:
- mmtk-core: https://github.com/mmtk/mmtk-core/pull/1064
- mmtk-openjdk: https://github.com/mmtk/mmtk-openjdk/pull/265
Other related issues and PRs:
- https://github.com/mmtk/mmtk-core/issues/1031: Supporting VMs where slots may hold tagged values. It is the main motivation of this MEP.
- https://github.com/mmtk/mmtk-core/issues/1038: Problems with the subsuming write barrier function.
- https://github.com/mmtk/mmtk-core/pull/1032: (merged)
Edge::load()
can now return NULL for tagged values so that the slot can be skipped. It can be improved by this MEP if we useNone
instead of NULL. Also fixes a missing NULL check in the ReferenceProcessor. - https://github.com/mmtk/mmtk-core/issues/1044: Archived discussions about the definition of
ObjectReference
. CanObjectReference
be addresses, handles, tagged pointers, etc.? - https://github.com/mmtk/mmtk-core/issues/1076: A related but orthogonal topic about skipping some edges to singleton NULL-like objects during tracing.
Option
is not FFI-safe (even though it should be). I learnt this the hard way. We can define our own "Option
" type but then it's annoying to convert between the two.
Option
is not FFI-safe (even though it should be). I learnt this the hard way. We can define our own "Option
" type but then it's annoying to convert between the two.
MMTk never provides an official C API. So it is the bindings that wrap MMTk API functions and provide them as functions callable from C. If an MMTk API uses Option<ObjectReference>
as a parameter, and assume ObjectReference
is backed by NonZeroUsize
, then the binding should still use 0 (or NULL) to represent a None
in this case. For example,
// In MMTk core
fn some_api_function(maybe_object: Option<ObjectReference>) -> Option<ObjectReference> { ... }
Its wrapper should be
// In VM binding
extern "C" fn mmtk_some_api_function(maybe_object: usize) -> usize {
let arg: Option<ObjectReference> = match NonZeroUsize::new(maybe_object) {
None -> None,
Some(nzu) -> Some(ObjectReference(nzu)),
};
let result: Option<ObjectReference> = some_api_function(arg);
match result {
None -> 0,
Some(object) -> object.as_usize(),
}
}
Or more concisely,
extern "C" fn mmtk_some_api_function(maybe_object: usize) -> usize {
some_api_function(ObjectReference::from_usize_zeroable(maybe_object)).to_usize_zeroable()
}
where {from,to}_usize_zeroable
can be defined to convert between usize
values that may be zero and Option<ObjectReference>
. But anyway, we don't need to expose Option<ObjectReference>
through foreign functions.
I kind of disagree that it is not required to be exposed. It is more natural to a binding developer to already use the type MMTk uses, i.e. ObjectReference
(or Option<ObjectReference>
) in their own API, instead of converting to-and-from usize
which can get pretty annoying very fast.
... It is more natural to a binding developer to already use the type MMTk uses, i.e.
ObjectReference
(orOption<ObjectReference>
) in their own API,...
If "their own API" is in Rust, they can use Option<ObjectReference>
in their API, too.
But if the binding needs an API for the runtime implemented in another language, or passing Option<ObjectReference>
to (AoT or JIT) compiled code, they need to think about the encoding of None
and Some(object)
anyway. In theory, if their counterpart of the ObjectReference
type is nullable, that'll be a different type from ObjectReference
if MMTk's ObjectReference
is not nullable. l prefer letting the VM make it clear that they are different.
And yes. Having Option<ObjectReference>
have the same layout as usize
, with None
encoded as 0, will be ideal because that will greatly reduce the amount of the code for converting between them (although I think they will eventually be optimized to no-op by the compiler). But for now, according to Rust's reference, transmuting None
(of Option<NonZeroUsize>
) to usize
is still an undefined behavior, although 0 is the only possible encoding of None
(because all non-zero usize
values are possible for NonZeroUsize
). Unless Rust makes a promise for the representation of None
in this case, we will have to write our custom converters.
If "their own API" is in Rust, they can use Option<ObjectReference> in their API, too.
That's not FFI then. It's just Rust, which is perfectly fine. Unfortunately, most modern production systems are written in C or C++ so we need to think about FFI regardless.
I think then the ideal is that ObjectReference
is completely opaque to MMTk.
If "their own API" is in Rust, they can use Option in their API, too.
That's not FFI then. It's just Rust, which is perfectly fine. Unfortunately, most modern production systems are written in C or C++ so we need to think about FFI regardless.
I think then the ideal is that
ObjectReference
is completely opaque to MMTk.
Well, not 100% opaque. It has to satisfy some criteria. More detailed discussion is here: https://github.com/mmtk/mmtk-core/issues/1044 But I don't object the idea that there are other possible ways to implement ObjectReference
besides its current representation (backed by usize
or NonZeroUsize
).
In the preliminary implementation https://github.com/mmtk/mmtk-core/pull/1064, ObjectReference
has two methods to convert from Address
:
pub fn from_raw_address(addr: Address) -> Option<ObjectReference>;
pub unsafe fn from_raw_address_unchecked(addr: Address) -> ObjectReference;
from_raw_address_unchecked
is used in two places in mmtk-core:
- In the sweeping code in the native mark-sweep space. When sweeping a block, we convert cell addresses (plus offset or not) in to
ObjectReference
instances. As long as the block itself does not start at address 0, none of its cells shall have 0 address. So we can safely use the unchecked conversion. - The forwarding pointer. The current code loads an
usize
atomically fromLOCAL_FORWARDING_POINTER_SPEC
, convertusize
toAddress
, and then convertAddress
toObjectReference
. Because we know that the forwarding pointer can never be null (they are always the addresses of newly allocated to-space copies of objects), we don't need to check for 0. We may allow loading the metadata asNonZeroUsize
(currently onlyu8
,u16
,u32
,u64
andusize
implementsMetadataValue
) so that we can directly makeObjectReference
fromNonZeroUsize
. Alternatively, we can make forwarding pointers non-metadata and bypass the limitation of our current metadata implementation.
In the OpenJDK binding, from_raw_address_unchecked
is used in several more places:
- In
ObjectModel::copy
: We usefrom_raw_address_unchecked
to convert the result ofalloc_copy
into anObjectReference
. We are sure the result ofalloc_copy
can never be zero.- Wait. What if we run out of defragmentation space?
-
ObjectModel::get_reference_when_copied_to
: Since MarkCompact figured out the destination of objects, the destination cannot be zero address. -
ObjectModel::address_to_ref
: MMTk never calls this. It is supposed to be the inverse operation ofref_to_address
which always converts from a validObjectReference
, therefore the resulting address can never be 0. -
OpenJDKEdge::decompress(v: u32) -> Option<ObjectReference>
: Now we need to manually consider the case ofv == 0
in which case we need to do arithmetic operations on the address and get ausize
result. We know the result cannot be 0 ifv
is not 0, but the Rust language doesn't know it. We cannot do adding operation onNonZeroUsize
because the result may be zero.
Sometimes I feel that adding a NonZeroAddress
may reduce the number of from_raw_address_unchecked
. But the assumption of non-zero must be made at one of the steps of conversion, and it will always be unsafe
(in the Rust language's sense, of course. We know from the algorithm that it is safe).
For languages that use a special singleton object to represent its 'NULL' reference, can they use Some(ObjectReference)
for the singleton object? When they allocate and initialize the object, they will use Some(ObjectReference)
to refer to it when using any MMTk API. However, during tracing such as Edge::load()
, if they use Some(ObjectReference)
for the special singleton, we (MMTk) will trace the single object (which is undesirable). If they use None
, that means the binding treats the same object differently by using Some(ObjectReference)
or None
in different scenarios. This would sound confusing.
For languages that use a special singleton object to represent its 'NULL' reference, can they use
Some(ObjectReference)
for the singleton object? When they allocate and initialize the object, they will useSome(ObjectReference)
to refer to it when using any MMTk API. However, during tracing such asEdge::load()
, if they useSome(ObjectReference)
for the special singleton, we (MMTk) will trace the single object (which is undesirable). If they useNone
, that means the binding treats the same object differently by usingSome(ObjectReference)
orNone
in different scenarios. This would sound confusing.
It depends how that singleton is allocated.
If the singleton is allocated in the MMTk heap, then Edge::load()
should return Some(objref)
. mmtk-core will trace that object, and forward it, too (if it is a moving GC).
If the singleton is a C static variable, or if it is allocated by malloc, it should return None
, and mmtk-core will not touch that field.
In CPython, the None
object is allocated as a C static variable _Py_NoneStruct
, so Edge::load()
should return None
. CPython uses naive RC, and in the newest version it elides RC operations for None
.
In Julia, the missing
variable points to an ordinary Julia object: const missing = Missing()
, so Edge::load()
should return Some(objref)
.
jl_nothing
is allocated with jl_nothing = jl_gc_permobj(0, jl_nothing_type);
. Since it is permanent and non-moving, it doesn't matter if we trace it or not. So it should work regardless if Edge::load()
returns None
or Some(objref)
.
For languages that use a special singleton object to represent its 'NULL' reference, can they use
Some(ObjectReference)
for the singleton object? When they allocate and initialize the object, they will useSome(ObjectReference)
to refer to it when using any MMTk API. However, during tracing such asEdge::load()
, if they useSome(ObjectReference)
for the special singleton, we (MMTk) will trace the single object (which is undesirable). If they useNone
, that means the binding treats the same object differently by usingSome(ObjectReference)
orNone
in different scenarios. This would sound confusing.It depends how that singleton is allocated.
If the singleton is allocated in the MMTk heap, then
Edge::load()
should returnSome(objref)
. mmtk-core will trace that object, and forward it, too (if it is a moving GC).If the singleton is a C static variable, or if it is allocated by malloc, it should return
None
, and mmtk-core will not touch that field.In CPython, the
None
object is allocated as a C static variable_Py_NoneStruct
, soEdge::load()
should returnNone
. CPython uses naive RC, and in the newest version it elides RC operations forNone
.In Julia, the
missing
variable points to an ordinary Julia object:const missing = Missing()
, soEdge::load()
should returnSome(objref)
.
jl_nothing
is allocated withjl_nothing = jl_gc_permobj(0, jl_nothing_type);
. Since it is permanent and non-moving, it doesn't matter if we trace it or not. So it should work regardless ifEdge::load()
returnsNone
orSome(objref)
.
This sounds like a pretty confusing definition of Edge::load()
as a part of public API. What would the comments for Edge::load()
be?
Also based on what you said, in some cases, the special singleton object will be put to the tracing queue every time we see such an object in an empty slot, and that probably will incur an overhead. Usually those should be dealt with using a check (like our old null check in trace_object
). If the object is considered 'null', they should not be put to the queue at all.
This sounds like a pretty confusing definition of
Edge::load()
as a part of public API. What would the comments forEdge::load()
be?
The current doc is:
pub trait Edge: Copy + Send + Debug + PartialEq + Eq + Hash {
/// Load object reference from the slot.
///
/// If the slot is not holding an object reference (For example, if it is holding NULL or a
/// tagged non-reference value. See trait-level doc comment.), this method should return
/// `None`.
///
/// If the slot holds an object reference with tag bits, the returned value shall be the object
/// reference with the tag bits removed.
fn load(&self) -> Option<ObjectReference>;
It may be worth mentioning those singleton objects in the comments. The main idea is, return Some(objref)
if
- The slot holds an object reference that can be traced, and
- should be traced.
"Can be traced" rules out null
, nil
, true
, false
, small integers, etc.
"should be traced" main affects objects in the immortal space. It's harmless not to trace those immortal objects as long as (1) they don't point to other objects, or (2) they are treated as root.
Statically allocated objects should not be traced by MMTk, unless the VM implements ActivePlan::vm_trace_object
and traces those static objects. But I prefer treating those static objects as non-moving rooted objects so that they don't need to be traced.
Also based on what you said, in some cases, the special singleton object will be put to the tracing queue every time we see such an object in an empty slot, and that probably will incur an overhead. Usually those should be dealt with using a check (like our old null check in
trace_object
). If the object is considered 'null', they should not be put to the queue at all.
Some special objects, such as missing
in Julia, should be queued because they are in the MMTk heap, and may move. Therefore all fields pointing to missing
needs to be updated if it is moved. If this is a bottleneck, we can allocate missing
in the immortal space. In this case, trace_object
(particularly, the dynamic-dispatching PlanTraceObject::trace_object
and SFT::trace_object
) will do a check, but the check is "whether the object is in the immortal space", not "if it is missing
or nothing
".
Steve once proposed doing the dispatch when scanning the object and put objects in different queues. If we implement that in the future, those special objects (as static variables, or in immortal spaces, or objects known to be rooted and non-movable) will be filtered out before they are enqueued.
It may be worth mentioning those singleton objects in the comments. The main idea is, return
Some(objref)
if
- The slot holds an object reference that can be traced, and
- should be traced.
- Is it possible for someone to make the decision of what 'can be traced' and 'should be traced' without knowing the internal of MMTk?
- The word of
tracing
also makes assumptions about the GC algorithm. - This also implies
Edge::load()
is only used for the tracing purpose.
Some special objects, such as missing in Julia, should be queued because they are in the MMTk heap, and may move. Therefore all fields pointing to missing needs to be updated if it is moved. If this is a bottleneck, we can allocate missing in the immortal space. In this case, trace_object (particularly, the dynamic-dispatching PlanTraceObject::trace_object and SFT::trace_object) will do a check, but the check is "whether the object is in the immortal space", not "if it is missing or nothing".
Those objects should be immortal, and should not be traced. In the current Scanning::scan_object()
method, the binding will simply use the edge visitor for every slot, and there isn't a way to rule out those objects.
I see your point. It's probably not a good idea to define the semantics of Edge::load()
based on tracing. Scanning::scan_object
is not (only) designed for tracing, and that's part of why I removed the TransitiveClosure
trait and replaced it with EdgeVisitor
in the first place. But we know that we don't need to trace every edge for GC to work correctly, and we may use Edge::load()
as an opportunity for optimization.
- Is it possible for someone to make the decision of what 'can be traced' and 'should be traced' without knowing the internal of MMTk?
The simplest way to decide is, if the slot contains a reference to an object in the heap, then return Some(objref)
. This is the safest.
In the simplest case, the VM only ever allocate objects using alloc()
, that is, allocating in the MMTk heap. If we use the VO bit, is_mmtk_object(objref)
should also return true
for such objects.
But if the VM implements ActivePlan::vm_trace_object
, things may become a bit complicated, because Edge::load()
has to return Some(objref)
so that objref
can be traced by ActivePlan::vm_trace_object
. But the VM can also choose to filter out some off-heap objects so that they don't reach trace_object
and therefore not vm_trace_object
.
For Julia, jl_nothing
and missing
are MMTk heap objects. Tracing them will always be safe. But we know it may have overhead to trace them, and we may want to optimize it by not tracing them. (This need to be verified. If very few fields ever point to jl_nothing
or missing
, then performing checking for every single ObjectReference
may be more expensive than simply tracing them.) If we decide to do the optimization, we need to know (1) if the object may move, (2) if the object is rooted, and (3) if the object has non-immortal children. Currently, (1) we have pinning API, and it is part of the semantics that the immortal space is non-moving; (2) we allow the VM to give a list of pinning roots; (3) We do have transitive pinning roots, but some VMs may simply know that some objects are leaves. I think that's enough for VMs to decide not to trace some fields using only public API and semantics. At least it is so for tracing GC. We need more semantics for RC, for example, whether we should apply any inc/dec for objects in the immortal space.
For CPython, Py_None
(the Python object) is a static variable. It is easy to rule it out by letting Edge::load()
return None
(the Rust value) so that Py_None
(the Python object) will not reach trace_object
or vm_trace_object
.
I think at this moment, we don't mention the optimization in Edge::load()
, and say "it should return Some(objref)
if the slot contains a reference to the MMTk heap or should be traced by vm_trace_object
". We can discuss that optimization separately and update the contract of Edge::load()
(as well as whether Scanning::scan_object_and_trace_edge
should skip some edges).
- The word of
tracing
also makes assumptions about the GC algorithm.
The same is true for ActivePlan::vm_trace_object
. If a VM implements that, the VM surely knows something about tracing. This allows the VM to decide whether they should trace objects outside the MMTk heap so that they can be traced by vm_trace_object
.
This also implies that we haven't designed the reference counting counterpart of vm_trace_object
. But if a VM needs that, the VM probably already knows the internals of MMTk.
- This also implies
Edge::load()
is only used for the tracing purpose.
Edge::load()
should not only be used for tracing. It can be used for reference counting, too. Scanning::scan_object
will enumerate all slots as Edge
anyway, but when loading, some slots will return None
. That's enough for identifying children, and do inc/dec operations.
And I also expect Edge::load()
to be used for heap dumping, too. Depending on what we need, we may need to traverse every node and edge, or we may not. Suppose we need to find all outgoing edges from a node, then Edge::load()
shall not return None
for those objects such as jl_nothing
and missing
.
If we want to do the optimization of omitting some edges (such as those pointing to jl_nothing
, missing
and PyNone
), we probably need an option (such as parameter to Edge::load()
and Scanning::scan_object
) to disable such optimization in some cases.
Those objects should be immortal, and should not be traced. In the current
Scanning::scan_object()
method, the binding will simply use the edge visitor for every slot, and there isn't a way to rule out those objects.
Right. Scanning::scan_object
currently cannot rule out those object. There are other chances for them to be ruled out
- in
Edge::load()
. (just compare against some special values or look at special type tags or bit patterns) - in
SFTMap
orPlanTraceObject
(they will be dispatched to the immortal space (or other spaces with desired properties) and treated specially) - in
vm_trace_object
(it's up to the VM)
One of them may be more efficient than others according to the nature of the concrete VM.
@wenyuzhao asked how to store null pointer to an Edge
(a slot).
In general, this should be done in a VM-specific way, because not all programming languages have null pointers. For example, I made changes to the OpenJDK binding and added an OpenJDK-specific method OpenJDKEdge::store_null
:
impl<const COMPRESSED: bool> OpenJDKEdge<COMPRESSED> {
// ...
pub fn store_null(&self) {
if cfg!(any(target_arch = "x86", target_arch = "x86_64")) {
if COMPRESSED {
if self.is_compressed() {
self.x86_write_unaligned::<u32, true>(0)
} else {
self.x86_write_unaligned::<Address, true>(Address::ZERO)
}
} else {
self.x86_write_unaligned::<Address, false>(Address::ZERO)
}
} else {
debug_assert!(!COMPRESSED);
unsafe { self.addr.store(0) }
}
}
But for debug purposes, we may add a method to the Edge
trait to store 0 (or arbitrary value) to a slot, just to let it hold an obviously invalid value. But that's for debug purposes, only. Currently, mmtk-core never stores 0 to an Edge
. But it may be VM-specific, too. For example, in Lua, a slot occupies two words instead of one. It is unclear what the signature of such function should have.
trait Edge {
fn store_null(&self);
fn store_arbitrary_value<T>(&self, arbitrary_value: T); // What if `T` doesn't have the same size as the slot?
}
MMTk can only use None
in Option<ObjectReference>
to refer to a null reference, which may be too restrictive. In your languages survey, when a language has different representations of 'null references' and MMTk can only refer to them as None
, it causes information loss.
I am wondering if we would like to introduce a type such as NullableObjectReference
and allow the VM to implement it as an amendment. MMTk can use NullableObjectReference
when a reference may be null, which is more expressive than Option<ObjectReference>
. And MMTk still uses ObjectReference
when a reference is strictly not null.
This also solves the issue in the above comment, as MMTk is able to represent a null reference.
MMTk can only use
None
inOption<ObjectReference>
to refer to a null reference, which may be too restrictive. In your languages survey, when a language has different representations of 'null references' and MMTk can only refer to them asNone
, it causes information loss.I am wondering if we would like to introduce a type such as
NullableObjectReference
and allow the VM to implement it as an amendment. MMTk can useNullableObjectReference
when a reference may be null, which is more expressive thanOption<ObjectReference>
. And MMTk still usesObjectReference
when a reference is strictly not null.This also solves the issue in the above comment, as MMTk is able to represent a null reference.
But the fact is, there is only one use of ObjectReference::NULL
. It is in the default impementation of ReferenceGlue::clear_reference
, but it is legacy and Java-specific, and Julia overrides it. There is existing code that calls .is_null()
in assertions (which can be safely removed if ObjectReference
is not nullable), or in places that mean "no ObjectReference" (which should have used Option<ObjectReference>
). The rest of MMTk core doesn't use ObjectReference::NULL
. It doesn't compare against ObjectReference::NULL
, or write ObjectReference::NULL
into the memory. So I don't think we need to make MMTk aware of the presence of null pointers or any variants of it. MMTk doesn't care about anything slot values that are not references, including null
, nil
, true
, false
, small integer, ... All that MMTk cares is that "the slot doesn't hold a reference".
Speaking of information loss, the only case of information loss was the bug I mentioned in slot.store(new_objref)
where new_objref
could be a "benign" NULL. If we check if the slot holds any reference, we can skip the trace_object
and the subsequent slot.store
completely. That is, if the slot doesn't hold a reference, MMTk shouldn't touch it.
I don't think we want to focus on the current code base when discussing MEP. The current design is obviously very Java centric, and is not general. The question is whether the proposal is flexible for future. Being not able to express a proper null pointer could be a potential issue. But I don't think it is a show stopper, and it can be amended.
This is slightly off-topic, but perhaps we want to add functions like fn is_null() -> bool
and fn store_null()
to the Edge
/Slot
trait to aid debugging and help implement reference processing
This is slightly off-topic, but perhaps we want to add functions like
fn is_null() -> bool
andfn store_null()
to theEdge
/Slot
trait to aid debugging and help implement reference processing
I think it is OK for debugging. But just like I mentioned, currently MMTk doesn't seem to care about any form of null references. Let's see if we need them in the future, and we shall add them when needed.
But if we want to implement reference processing, this may not be the right choice. For example, Julia has two forms of null references, nothing
and missing
. To clear a reference in Julia, it needs to set it to nothing
. But that may not be the right kind of null reference for other scenarios. So it is unclear which kind of null reference store_null()
should store. It is better to leave it to the VM. Nothing prevents the VM to implement additional methods for the concrete slot type (such as OpenJDKEdge
and the VM-specific weak reference type. By the way, ultimately we should move the reference processor to the binding.
During the meeting on 31 January 2024, we reached consensus on this MEP. We will wait for 24 hours for anyone to raise objections against this MEP. I nobody express their objection before 2pm, 1 February 2024 Canberra Time (UTC+11:00), we will declare this MEP as passed.
I made some changes according to our discussion today.
- Explicitly listed "skipping edges to singleton NULL-like objects" as a non-goal, and linked to https://github.com/mmtk/mmtk-core/issues/1076
- Mentioned
alloc
may return 0. - Listed alternatives of (1) keeping the status quo and do nothing, and (2) allowing ObjectReference to hold other variants of NULL values determined by the binding.
- Mentioned compressed pointers and NULL in the Assumptions section.
Since there is no objections raised after the meeting, this MEP passes the review.