openj9 Extend Class Chains to support Value Types

The following is the result of an offline discussion with @jdmpapin based on https://github.com/eclipse-openj9/openj9/issues/15424#issuecomment-1170360603. It should be noted that this design does not yet take into account array flattening, though I imagine it should be relatively straightforward to do so.

Design

At the moment, Class Chains can only describe the shape of classes that do not have any flattened fields. As Value Types allow fields to get flattened, the Class Chains need to be extended so that it can properly represent the shape of classes with flattened fields.

A Class Chain currently has the following layout:

+--------------------+ // Start of chain data
| Size of chain data |
+--------------------+ // Start of class chain
| ROM Class offset   |  
| ROM Class offset   |
| ...                |
| ROM Class offset   |
+--------------------+ // End of class chain

The proposed Extended Class Chain will have the following layout

+-------------------------+ // Start of chain data
| Size of class chain     | // Size of just the class chain, not including the field data section
+-------------------------+ // Start of class chain
| ROM Class offset        |
| ROM Class offset        |
| ...                     |
| ROM Class offset        |
+-------------------------+ // End of class chain; start of flattened field data
| Flattened field count   |
| Field index             | // Identifies the first flattened field
| Class Chain offset      | // Class chain of the type of the first flattened field
| Field index             | // Identifies the second flattened field
| Class Chain offset      | // Class chain of the type of the second flattened field
| ...                     |
| Field index             | // Identifies the nth flattened field
| Class Chain offset      | // Class chain of the type of the nth flattened field
+-------------------------+ // End of extended class chain

As can be imagined, this Extended Class Chain is much bigger than the traditional Class Chain. As such, there are two points of optimization via deduplication, one of which is straightforward and in fact is part of the scheme proposed above:

The field data section of the Extended Class Chain only needs to contain flattened fields of the chain's owning class. The reason for this is because each field data section contains the offset to the class chain of the class of the flattened field. If the flattened field itself has flattened fields, then those will be addressed in the field's class' class chain.
The Extended Class Chain will need to also list out any flattened fields in its owning class' superclasses. As such, it might be worth considering some form of sub chain sharing [1][2].

Important

The indices of the flattened fields in the field data must identify a field independently of the way in which the VM chooses to lay out the fields in memory. For example:

value V3 {
   float x;
   float y;
   float z;
}

class Entity {
   V3 position; // field 0
   V3 velocity; // field 1
}

Here, velocity is field 1 regardless of whether position is flattened. This ensures that the mapping between indices and fields is unambiguous even if memory layout changes between an AOT compilation and a subsequent AOT load.

These indices could be based on the declaration order of the fields in the ROM class.

If the class chain of the superclass is included in-line into the class chain of a derived class (as it is currently, and as opposed to being shared [1][2]), then these indices must cover not only the fields declared by the derived class, but also all fields declared by all of its superclasses.

Implementation

Class Chain Generation

In TR_J9SharedCache::rememberClass:

Acquire list of flattened fields for the class to be remembered.
Generate the class chain for the class of each such field.
- Obtaining the Class Chain will require a recursive call to rememberClass for the class of the flattened field.
Generate the traditional class chain for the class to be remembered.
Append the number of flattened fields.
Append the field index and class chain offset (generated in step 2) for each field acquired in step 1.

Class Chain Validation

In TR_J9SharedCache::classMatchesCachedVersion

Acquire list of flattened fields for the class being validated.
Validate class chain of the class being validated.
Check that the list of flattened fields for the class being validated matches those listed in the class chain.
- The fields listed in the class chain must be flattened, and the fields not listed there must not be flattened.
- The order in which the fields are listed is not necessarily significant. The comparison can ignore order, or some canonical order can be imposed, e.g. where the class chain must list the flattened fields in increasing order of index.
For each flattened field, validate the class chain of the type of the field
- This will require a recursive call to classMatchesCachedVersion for the class of the flattened field.

Memoization

Both generation and validation of class chains should be memoized, with results remembered at least for the duration of the top-level call. Otherwise, it will likely be possible to create classes for which each of those operations takes exponential time, especially if value types are not required to have fields (which would allow for value types that take up no memory when flattened).

Additionally, as the code stands in the Class Chain Generation phase, the class chain is generated in an array allocated on the stack [3] before it is stored into the SCC. However, because the extended class chain may require multiple recursive calls, there is an increased chance of a stack overflow. To deal with this, a reusable buffer dynamically allocated once on the heap should be used (though it may be worth adding the ability to resize it). This is ok because there are never going to be any intermediate results in the buffer; a complete extended class chain should only be generated and stored to the SCC strictly after all recursive calls from a given frame are complete, so that frame will have exclusive access to the reusable buffer. The class chain offsets from step 2 will be available in step 5 via the memoization map.

Other considerations

There will need to be other validations added in TR_SharedCacheRelocationRuntime::createAOTHeader and TR_SharedCacheRelocationRuntime::validateAOTHeader for global configs such as whether VT is enabled, as well as the flattening threshold value.

Also it is worth noting that if the VM always flattens the same fields with a given set of VT global config, then we may not need to list out the flattened field indices (though we will still need to list out the class chain offsets).

[1] https://github.com/eclipse-openj9/openj9/issues/9702 [2] https://github.com/dsouzai/openj9/commits/classChainSharing [3] https://github.com/eclipse-openj9/openj9/blob/a9bdd6d1aee01a07acc975d410ce57a85bbde571/runtime/compiler/env/J9SharedCache.cpp#L905

Jun 30 '22 20:06 dsouzai

FYI @a7ehuo

@jdmpapin feel free to edit the desc if there's anything wrong / I've missed anything.

Jun 30 '22 20:06 dsouzai

Commenting because I'd want your opinion before changing anything based on most of the following.

the "Field offset" for say Entity.velocity in the Extended Class Chain's field data should be the offset of an unflattened velocity, and not the start of the flattened velocity. This is necessary to ensure we can get the necessary field information during class chain generation/validation.

Considering that the offsets are sensitive to whether or not anything earlier is flattened, it might be better to identify fields by something like an index according to declaration order in the ROM class (or, without sharing the chain of the base class, an index according to the concatenation of the declaration order in the ROM classes of all base classes followed by the class in question, e.g. Object → AbstractCollection → AbstractList → ArrayList, if ArrayList were to have a value type flattened into it)

| Number of field offsets | // Start of field data for first flattened field
| Field offset            |
| Field offset            |
| ...                     |
| Field offset            |
| Class Chain offset      | // Class chain of the class of the first flattened field

Did you mean to include a count of the number of these sections before the first one?

The reason for the "Field offsets" is to represent multiple flattened fields of the same type. Therefore, each flattened field data section represents a unique class.

A simpler scheme would be to have a count of the total number of flattened fields, and then for each flattened field record some kind of field ID (e.g. the index mentioned above) and the class chain offset of its type. This scheme could even be more space-efficient than the one you proposed if the average number of flattened fields sharing a type is not much larger than 1. But in any case, it wouldn't be drastically less space efficient, because most of the data is already deduplicated behind the class chain offset

Also it is worth noting that if the VM always flattens the same fields with a given set of VT global config, then we may not need to list out the flattened field offsets.

I think in this scenario we would however need to specify the class chains of the types of all of the fields that could be flattened. Consider a class C that contains a flattenable field of type Foo. If Foo is large, the VM might choose not to flatten it. Then suppose we store a class chain for C that does not specify the class chain for Foo. We could load code in a later run where Foo is now small, and the VM chooses to flatten it, but we wouldn't necessarily notice that the layout of C has changed, because the class chain doesn't tell us that Foo affects the layout of C

Mostly unrelated, but how did an offset to the SVM WKC end up in the class chains?

Jun 30 '22 20:06 jdmpapin

Considering that the offsets are sensitive to whether or not anything earlier is flattened, it might be better to identify fields by something like an index

Right, the idea behind that is what I was going for when I wrote "offset of the flattened fields in the field data refers to offsets in a class' representation where nothing is flattened"; however, "index" makes this clearer.

Did you mean to include a count of the number of these sections before the first one?

We can; I had just included the total size of the chain data and the size of the class chain section of the extended class chain. Thus, the field data would be the total size minus the size of the class chain, and since each field data has a field to specify the number of offsets, we have all the info.

A simpler scheme would be to have a count of the total number of flattened fields, and then for each flattened field record some kind of field ID (e.g. the index mentioned above) and the class chain offset of its type.

I guess in that case, the amount of space we'd use would be proportional to 2n (where n is the total number of flattened fields) whereas in the layout in the desc, it's 2*u + n (where u is the number of unique (wrt to the J9Class) flattened fields). As such, the more unique fields we have, the more space we would use, and so unless for a given class u << n (which I think is only going to be true for very small classes), your proposed layout is going to be better.

I think in this scenario we would however need to specify the class chains of the types of all of the fields that could be flattened. Consider a class C that contains a flattenable field of type Foo. If Foo is large, the VM might choose not to flatten it. Then suppose we store a class chain for C that does not specify the class chain for Foo. We could load code in a later run where Foo is now small, and the VM chooses to flatten it, but we wouldn't necessarily notice that the layout of C has changed, because the class chain doesn't tell us that Foo affects the layout of C

If Foo.C was flattened in one run but not another, then the assumption "VM always flattens the same fields with a given set of VT global config" is already invalid. If there's no way to guarantee that the same set of fields will be flattened (maybe because what can be flattened is dynamically affected by things like heap space), then we should always put the field data in the extended class chain.

Also, what does "large" and "small" mean here? As in, if Foo is large in one run, why would it be small in another?

Mostly unrelated, but how did an offset to the SVM WKC end up in the class chains?

You know, I have no idea; I think I mixed up the data in the class chain with the full buffer we store to the SCC; I've updated the desc to not have this.

Jul 04 '22 14:07 dsouzai

Right, the idea behind that is what I was going for when I wrote "offset of the flattened fields in the field data refers to offsets in a class' representation where nothing is flattened"; however, "index" makes this clearer.

:+1: - I've edited the main description to use "index" and hopefully to clarify the requirements for the field indices. I also removed the mention of fields inherited from interfaces (since interfaces can't declare fields), and I moved the class chain of a flattened type before the indices of the corresponding fields, just to consolidate the fixed-size part of that layout

I had just included the total size of the chain data and the size of the class chain section of the extended class chain. Thus, the field data would be the total size minus the size of the class chain, and since each field data has a field to specify the number of offsets, we have all the info.

Oh, I see now! Yeah, that works

so unless for a given class u << n (which I think is only going to be true for very small classes), your proposed layout is going to be better

From your setup here with u and n, the break-even point is u = n/2. Your original scheme will be smaller when u < n/2, and my scheme will be smaller when u > n/2. If you're happy to switch (either because you expect to save space on average, or just to simplify things a little), then I can change the layout diagram as well

If Foo.C was flattened in one run but not another, then the assumption "VM always flattens the same fields with a given set of VT global config" is already invalid. If there's no way to guarantee that the same set of fields will be flattened (maybe because what can be flattened is dynamically affected by things like heap space), then we should always put the field data in the extended class chain.

Also, what does "large" and "small" mean here? As in, if Foo is large in one run, why would it be small in another?

In the example I gave, we don't have Foo.C. Instead we have C.foo of type Foo. By small/large I just mean the memory footprint of an instance, e.g. Foo could be 256 bytes when we compile (and therefore not flattened), but then on load we could see a different Foo that is only 8 bytes (and therefore flattened). So the contents of Foo can be relevant to the layout of C in different processes even if compilation sees a layout of C into which Foo has not been flattened

Jul 04 '22 18:07 jdmpapin

From your setup here with u and n, the break-even point is u = n/2. Your original scheme will be smaller when u < n/2, and my scheme will be smaller when u > n/2. If you're happy to switch (either because you expect to save space on average, or just to simplify things a little), then I can change the layout diagram as well

Yeah I think it's simpler and as you mentioned, in the case where the original is more efficient, it wouldn't be drastically more so.

Jul 04 '22 18:07 dsouzai

fyi @mpirvu @vijaysun-omr @mstoodle

Jul 05 '22 19:07 dsouzai

fyi @hzongaro @tobi @hangshao0

Jul 06 '22 19:07 a7ehuo

openj9 openj9 copied to clipboard

Extend Class Chains to support Value Types

Design

Important

Implementation

Class Chain Generation

Class Chain Validation

Memoization

Other considerations

openj9
openj9 copied to clipboard