mmtk-core
mmtk-core copied to clipboard
vm_space (boot image space) refactoring
We currently hard coded a size for VM space, and make the assumption that VM space is at the beginning of our heap range. This is basically the setting from JikesRVM. To make it more general, I think we should allow the binding to specify the start and the size of VM space. We can still make the assumption that the VM space should be within the range of [HEAP_START, HEAP_END)
.
With recent changes in https://github.com/mmtk/mmtk-core/pull/625 and https://github.com/mmtk/mmtk-core/pull/629, we should be able to allow a binding to specify VM space range through options.
There are a few questions we need to figure out before doing the change:
- Can vm space be discountiguous? I think we need to allow it.
- I assume VM space needs to in the address range we use. Does VM space need to be in the heap range?
- What is the semantics of VM space? Do we handle the tracing of objects in vm space (including metadata for the objects)?
- Do we still need vm space, given that we have
vm_trace_object()
now.
We discussed about this topic.
The original idea and what the current code reflects is: we have a specific space, called VM space. It currently uses an ImmortalSpace
, but it should be a special space whose semantic is defined by the VM. It is allocated and managed by the VM. We would want the binding to tell us its address range, so we can do dispatching. But other than dispatching, everything is up to the VM, including how to trace object in the space, liveness, movable, etc. For example, MMTk does dispatching, and the special VM space will call to the binding for its behaviours.
A maybe simpler and cleaner idea is that MMTk only needs to know its own space. Anything that is not in MMTk's space could be a VM-allocated object, a VM-managed pointer, or an invalid reference. So whenever we encounter an address that is not in MMTk's spaces, we will just call the binding -- the binding can then decide whether it is a known address/object for the binding, or it is a rogue/invalid pointer. The current vm_trace_object()
reflects this design, and we may need more similar methods like vm_trace_object()
when we encounter unknown references in different scenarios.
Though MMTk only knows its own space, we may allow VMs to create spaces in MMTk. They can implement their own policy and semantics, or reuse MMTk's policy, to create spaces. This means we can still do dispatching for objects in those spaces and the semantics is defined by the VM. This is similar to the original idea, except that we expose a way to create spaces for the bindings, rather than exposing a specific VM space. We will need to deal with discontiguous spaces in 64 bits. For VM managed spaces, their address range should not conflict with our heap range.
A maybe simpler and cleaner idea is that MMTk only needs to know its own space. Anything that is not in MMTk's space could be a VM-allocated object, a VM-managed pointer, or an invalid reference. So whenever we encounter an address that is not in MMTk's spaces, we will just call the binding -- the binding can then decide whether it is a known address/object for the binding, or it is a rogue/invalid pointer. The current
vm_trace_object()
reflects this design, and we may need more similar methods likevm_trace_object()
when we encounter unknown references in different scenarios.
The issue for this is the object metadata. If the VM uses side metadata, we would need to be aware of the VM space so we can mmap the side metadata for the region. Although most of the metadata is used during GC (in trace_object
), there are exceptions, such as the log bit used by write barriers.
Though MMTk only knows its own space, we may allow VMs to create spaces in MMTk. They can implement their own policy and semantics, or reuse MMTk's policy, to create spaces. This means we can still do dispatching for objects in those spaces and the semantics is defined by the VM. This is similar to the original idea, except that we expose a way to create spaces for the bindings, rather than exposing a specific VM space. We will need to deal with discontiguous spaces in 64 bits. For VM managed spaces, their address range should not conflict with our heap range.
Implementing a space usually can reuse many internal types, like CommonSpace
, PageResource
. Those are internal types and not public to the users. In this case, the users would need to implement the VM space without reusing any MMTk internal types, which sounds like a major task. One way to solve this is that we can implement most of the VMSpace
inside MMTk, and only forward certain calls to the bindings.
https://github.com/mmtk/mmtk-core/pull/802 allows the runtime to specify the start and the size of a VM space, and allows the runtime to specify those after MMTk is initialized.
The PR assumes the VM space range is outside the heap range we use for our internal spaces (AVAILABLE_START
, and AVAILABLE_END
). This makes things easier.
One leftover issue for the PR is that we need a way to tell if an object is in MMTk's heap (in internal spaces or VM space). In Java MMTk, based on the fact that the VM space is next to the internal spaces, a bound check is possible -- any object between HEAP_START
and HEAP_END
is in MMTk heap. With MMTk core, as a runtime can specify any address range as the VM space, we cannot use bound check any more. We could use SFT
or VMMap
. I haven't checked if this works.
https://github.com/mmtk/mmtk-core/pull/864 further allows discontiguous VM space.
One leftover issue for the PR is that we need a way to tell if an object is in MMTk's heap (in internal spaces or VM space). In Java MMTk, based on the fact that the VM space is next to the internal spaces, a bound check is possible -- any object between
HEAP_START
andHEAP_END
is in MMTk heap. With MMTk core, as a runtime can specify any address range as the VM space, we cannot use bound check any more. We could useSFT
orVMMap
. I haven't checked if this works.
is_in_mmtk_spaces()
can tell if an object is in MMTK's heap (including VM space).
This issue can be closed now.