mmtk-core icon indicating copy to clipboard operation
mmtk-core copied to clipboard

Allow internal pointers to point to the end of an object

Open qinsoon opened this issue 2 months ago • 4 comments

qinsoon avatar Oct 29 '25 03:10 qinsoon

If there are two object A and B allocated adjacent to each other, A is before B, and A has a char buffer[0] as the last field. If &A.buffer happens to equal the address of &B, would &A.buffer be considered an internal pointer of A, or an internal pointer of B? This may affect find_object_from_internal_pointer which is currently the only use case of is_internal_ptr.

And if this can happen in a VM, if the stack contains a pointer to &B, would it also keep the object A alive because it could potentially be an interior pointer to &A.buffer as well? Which object would find_object_from_internal_pointer(&B, LIMIT) return?

wks avatar Oct 29 '25 05:10 wks

If there are two object A and B allocated adjacent to each other, A is before B, and A has a char buffer[0] as the last field. If &A.buffer happens to equal the address of &B, would &A.buffer be considered an internal pointer of A, or an internal pointer of B? This may affect find_object_from_internal_pointer which is currently the only use case of is_internal_ptr.

And if this can happen in a VM, if the stack contains a pointer to &B, would it also keep the object A alive because it could potentially be an interior pointer to &A.buffer as well? Which object would find_object_from_internal_pointer(&B, LIMIT) return?

It depends on the runtime -- how they define the object reference for MMTk. VO bits is marked at the location of object reference.

For example, if we have two objects and each include a 8-byte header and 8-byte payload (including the zero-size field), say a_object_start = 0x1000, a_object_ref = 0x1008 and b_object_start = 0x1010, b_object_ref = 0x1018. 0x1010 is an internal pointer for a. Any address between 0x1010 to 0x1018 is not an internal pointer, because they are before the object reference where the VO bit is set.

The runtime has to be able to tell if a pointer is a pointer to the next object, or an internal pointer to the previous object.

qinsoon avatar Oct 29 '25 06:10 qinsoon

If there are two object A and B allocated adjacent to each other, A is before B, and A has a char buffer[0] as the last field. If &A.buffer happens to equal the address of &B, would &A.buffer be considered an internal pointer of A, or an internal pointer of B? This may affect find_object_from_internal_pointer which is currently the only use case of is_internal_ptr. And if this can happen in a VM, if the stack contains a pointer to &B, would it also keep the object A alive because it could potentially be an interior pointer to &A.buffer as well? Which object would find_object_from_internal_pointer(&B, LIMIT) return?

It depends on the runtime -- how they define the object reference for MMTk. VO bits is marked at the location of object reference.

For example, if we have two objects and each include a 8-byte header and 8-byte payload (including the zero-size field), say a_object_start = 0x1000, a_object_ref = 0x1008 and b_object_start = 0x1010, b_object_ref = 0x1018. 0x1010 is an internal pointer for a. Any address between 0x1010 to 0x1018 is not an internal pointer, because they are before the object reference where the VO bit is set.

The runtime has to be able to tell if a pointer is a pointer to the next object, or an internal pointer to the previous object.

I think it is fine as long as the runtime can tell whether it points to the next object or the previous object.

Currently, find_object_from_internal_pointer(start, limit) starts by checking the VO bit at start, so if &A.buffer == &B, it will return the object reference of B. Is this a problem for Julia? I am OK with it if the Julia VM has a way to tell them apart. But maybe we should edit the doc comment of find_object_from_internal_pointer to warn the user about this corner case.

wks avatar Oct 29 '25 06:10 wks

&A.buffer == &B

If B has a header, and &B (the object reference) is pointing to the payload (not the header), then you wouldn't have the situation of &A.buffer == &B.

If there is no header, or the object reference points to the start of the object, you would have &A.buffer == &B. But in this case, not only MMTk but also the runtime cannot tell if the pointer is for A or for B. To me, that sounds like a design issue for the runtime.

But maybe we should edit the doc comment of find_object_from_internal_pointer to warn the user about this corner case.

Will do that. I just encountered a case in Julia: for the value (true, ()), it is 9 bytes (including a 8 bytes header), and the runtime uses an address like 0x100009 for the second field. I am still evaluating if this change is correct for Julia.

qinsoon avatar Oct 29 '25 07:10 qinsoon