language icon indicating copy to clipboard operation
language copied to clipboard

Should structs support sub-classing?

Open leafpetersen opened this issue 1 year ago • 7 comments

In the struct proposal (#2360), I propose that structs be forbidden from extending concrete structs, though I permit extension of abstract structs. There is a brief discussion of allowing more general extension at the end of the proposal. This issue is to discuss the general question of whether to allow structs to extend other concrete structs, and if so, in what fashion? Concrete questions include:

  • May a concrete struct extend another concrete struct?
  • May a concrete struct override fields from its concrete super-struct?
  • May a concrete struct add additional fields?
  • May a struct extend another struct from a different library?

My proposal was designed to meet certain goals, which are useful to review for this discussion. These are not hard requirements necessarily, but they are relevant to the discussion. Specifically:

  • It is desirable that access to struct fields can be reliably compiled to simple memory accesses.
    • This pushes against making them virtual (at least outside of the current library): while whole program analysis can detect which fields are/are not overridden, it still means that code outside of the control of a library author can substantially impact the performance characteristics of the library.
  • It is desirable that access to struct fields can be reliably compiled to simple memory accesses even in a modular compilation setting.
    • This pushes against making fields virtual outside of the current library.
  • It is desirable that the memory layout (and specifically the size in memory) be predictable from the declaration to enable reliable unboxing (again, including in a modular compilation setting if possible).
    • This pushes against both virtuality, and allowing sub-structs to add fields.
  • It is desirable that field accesses on structs be promotable.
    • This pushes against allowing struct fields to be implemented via getters (i.e. virtual).

leafpetersen avatar Jul 29 '22 23:07 leafpetersen

cc @mit-mit @lrhn @eernstg @chloestefantsova @johnniwinther @munificent @stereotype441 @natebosch @jakemac53 @rakudrama @srujzs @sigmundch @rileyporter @mraleph @mkustermann

leafpetersen avatar Jul 30 '22 00:07 leafpetersen

The reasoning above seems to capture it well. Being more restrictive will give us more opportunity to optimize.

One may add that:

  • Shifts boxing/unboxing to different place: If we have sufficient guarantees we can unbox struct-fields and struct arguments/return-values across calls. Though that doesn't necessarily eliminate boxing/unboxing, but shifts it to different place: If value types flow into top-types or type parameters they have to be boxed at that point.

    This may not seem common, but I can easily imagine it's very common to construct arrays of value types, using List<..SomeValueType...> will cause heavy boxing/unboxing on the border. Has any thought been given to to that? (For our built-in value types, we have the dart:typed_data classes)

  • Predictability: Unboxing is only beneficial up to a point. If the struct is large, passing such structs unboxed across calls can be very costly due to amount of data that has to be passed. Similarly, structs with many fields can have significantly higher memory cost if the same struct is "referenced" from multiple objects (due to storing copies instead of references).

    So implementations can choose between a) Always use unboxing if possible. This makes perf predictable (adding one field has predictable impact) but may have above downsides for large structs. b) Use heuristics (e.g. a size-cutoff) to decide whether to use unboxed or boxed representation. This can lead to unpredictable perf/memory change if developer makes small changes to the code (e.g. adds field).

    Would we want to favor predictability here? If not, I assume we'd like to align backends vm/dart2js/wasm/... to agree on heuristics (since same dart code runs across many platforms)?

mkustermann avatar Aug 01 '22 08:08 mkustermann

(Not sure if this is the right issue for this, but ...)

I can easily imagine it's very common to construct arrays of value types

In fact, one may want to do that in composite structs themselves, e.g.

struct ComplexNumber {
  double real;
  double im;
}
struct ComplexMatrix2x2 {
  ComplexNumber[2] values;
}

mkustermann avatar Aug 01 '22 08:08 mkustermann

TL;DR: An un-boxable type must not have any concrete subtypes.

  • May a concrete struct extend another concrete struct?

It does prompt the question of what happens on an up-cast. The specification is unclear on whether an assignment to a super-type means. It seems to assume boxing when assigning to an interface or to Object, which preserves the runtime type of the struct.

Assigning a sub-struct to a super-struct type and allowing unboxing at the super-struct type seems like a guaranteed way to lose the sub-struct data and runtime-type. Unboxing only works when the static type contains all the information needed to rebox the values. (Unless the unboxing is subtype aware and retains all the extra fields too, which seems wasteful.)

So, even if a struct type can extend another struct type, to share implementation, it probably shouldn't be assignable to the struct type it extends.

Using abstract structs as super-types gives us a way out, by only ever unboxing concrete struct types. When you assign a struct type to an (abstract) super-struct type, it's always boxed (unless the compiler is smart and can see it is never downcast or reboxed again, then it can throw away the remaining unused data.

This applies to all members, not just the field getters. The members need to be virtual (some override members of Object, some might implement abstract method signatures from an abstract super-struct).

In short: Only if we don't introduce a subtype relation, it's entirely for implementation sharing. An un-boxable type must not have any concrete subtypes.

  • May a concrete struct override fields from its concrete super-struct?

If wo don't have a sub-type relationship, then it's fine. We can statically determine which variable you are accessing off any concrete struct type, and access it directly. If you have overriden a getter, we can see that at the call point.

When boxed, and accessed at a supertype interface, we use virtual/interface member access as always.

  • May a concrete struct add additional fields?

Same as above. If we don't allow you to add additional fields, then we might even allow assignment to the supertype, but then we probably need to unbox the class-ID as well, so we can reconstruct the structure. (Like we need to remember the type parameters of the struct.)

  • May a struct extend another struct from a different library?

I think we have plenty of problems inside the same library, if we can solve those sufficiently, my guess is that it will also work across libraries.

lrhn avatar Aug 01 '22 11:08 lrhn

My vote would be against extending concrete structs. My question here would be: what could be a real-world use-case for extending concrete structs? Cause I am struggling to figure out one. It don't think I ever used such extensibility in a language like C++.

mraleph avatar Aug 02 '22 13:08 mraleph

Perhaps the title of this issue should be 'Should concrete structs support sub-classing?'? With that, I believe everybody agrees on "No!" (at least until now ;-).

eernstg avatar Aug 02 '22 14:08 eernstg

@mkustermann

  • Shifts boxing/unboxing to different place: If we have sufficient guarantees we can unbox struct-fields and struct arguments/return-values across calls. Though that doesn't necessarily eliminate boxing/unboxing, but shifts it to different place: If value types flow into top-types or type parameters they have to be boxed at that point. This may not seem common, but I can easily imagine it's very common to construct arrays of value types, using List<..SomeValueType...> will cause heavy boxing/unboxing on the border. Has any thought been given to to that? (For our built-in value types, we have the dart:typed_data classes)

Yep, no free lunches. I do think that it's valuable to be able to pass and return things on the stack, since you can sometimes essentially non-speculatively push the boxing point further out, in hopes that you never reach a boxing point. I also would hope that this would make it easier to unbox sub-objects (e.g. represent a field in an object unboxed).

For arrays, if we get variance control, I think we should expose a primitive invariant array type, which could serve as the underlying building block for the List type, as well as a tool for writing more low level code. This would at least avoid the issue of having to deal with a List<Struct> being assigned to a List<Object>. You still, of course, have the problem of deciding whether to unbox in the array or not. Ultimately, it might be worth having unboxed tuples to put this in user control when necessary.

leafpetersen avatar Aug 02 '22 18:08 leafpetersen