language icon indicating copy to clipboard operation
language copied to clipboard

Should primary constructors for structs use named parameters, positional parameters, or allow either?

Open leafpetersen opened this issue 1 year ago • 8 comments

In the struct and extension struct proposal (#2360), the proposal adds primary constructors to structs. The generated primary constructor uses positional parameters, but there is discussion of the alternative approach of using named parameters. That is, the proposal is that the following:

struct Foo(int x, int y = 3);

is treated as roughly equivalent to:

class Foo {
  final int x;
  final int y;
  Foo(this.x, [this.y = 3]);
}

An alternative is to have the generated constructor use named parameters:

class Foo {
  final int x;
  final int y;
  Foo({required this.x, this.y = 3});
}

When @munificent has scraped corpuses of code before, I believe his conclusion is that code is split fairly evenly as to which form to use.

We could also choose to allow the user to specify: that is, we could allow the primary constructor to list zero or more parameters inside of braces, which would then become named parameters.

struct Foo(int x); // Constructor is Foo(this.x)
struct Bar({int x}); // Constructor is Bar({required this.x})
struct Baz(int x, {int y = 3}); // Constructor is Baz(this.x, {this.y = 3});

There are some obvious restrictions around the combinations of optional and named parameters, but otherwise I think this works out fine.

leafpetersen avatar Jul 29 '22 22:07 leafpetersen

cc @mit-mit @lrhn @eernstg @chloestefantsova @johnniwinther @munificent @stereotype441 @natebosch @jakemac53 @rakudrama @srujzs @sigmundch @rileyporter @mraleph

leafpetersen avatar Jul 30 '22 00:07 leafpetersen

Constructor parameter lists are more complicated than field declarations, and users want to have complete control over whether things are named or not, and in which order they occur. Fields are trivial in comparison.

Because of that I'd make the declaration a parameter list, allowing the user to get that control:

struct Foo(int x, {int y = 4}) {
  ...
}

Then that will introduce:

  • A final instance variable per parameter, with the name provided (can be private, even for named ones)
  • An unnamed constructor with the provided parameter list, except that private names get the leading _(s) removed, and an initializer list initializing each instance variable using the corresponding parameter.

If we allow extending other structs, the parameter list can also contain super.x parameters.

Default values will have to be constant.

If we want to have non-constant default values, we could:

  1. Allow that in general. (Default value expressions can be non-constant, can refer to any prior parameter, and any required parameter, and any type parameter, and static declarations, evaluated in source order). Or,
  2. Introduce new syntax for it, say (int x, int y {int z ??= x + y}). Same semantics

lrhn avatar Aug 01 '22 12:08 lrhn

Constructor parameter lists are more complicated than field declarations, and users want to have complete control over whether things are named or not, and in which order they occur. Fields are trivial in comparison.

+1. I've spent a bunch of time investigating primary constructors (specify the parameters and get the fields for free) and enhanced default constructors (specify the fields and get the parameters for free) and my general conclusion is that parameters contain more configured options than fields, so inferring fields from parameters is less lossy than trying to infer parameters from fields.

Parameters have:

  • Name
  • Type
  • Whether named or positional
  • Whether optional or required
  • If positional, the position
  • A default value
  • Whether the parameter itself is final or not (rarely used)

Fields have:

  • Name
  • Type
  • Initializer expression
  • Whether late or not
  • Whether covariant or not

It would be fairly straightforward to allow late and coviariant before parameters in primary constructors and then be able to fully specify the parameters and fields from a parameter list. It would be quite hard to extend field syntax to specify named/positional, optional/required, and what the order is.

On the other hand, I think field declarations visually scale better when you have many of them and they have doc comments and/or metadata annotations.

Allow that in general. (Default value expressions can be non-constant, can refer to any prior parameter, and any required parameter, and any type parameter, and static declarations, evaluated in source order).

+1.

munificent avatar Aug 02 '22 21:08 munificent

The one issue that I see with saying that these are parameter lists is that I see value in having final be the default. One workaround for that is simply to say that data class means that all fields are implicitly final (including those from the primary constructor).

leafpetersen avatar Aug 03 '22 05:08 leafpetersen

I think final-by-default is correct for a data class. They must be immutable to be safely unboxable (#2362). Inferring the fields from the constructor does not preclude adding final to those fields. You didn't write those fields and got something else.

I'm slightly more worried about changing the name of the constructor parameter (if case you want the field to be private, and the parameter to be named). That actually means that what you write as a parameter list, is not the parameter list you get.

Adding covariant might be reasonable. Probably not necessary for structs. Adding late seems unnecessary in general. Definitely not needed for structs, it's mutability. For classes using the same shorthand, the field is initialized by the primary constructor. It feels very weird to also have a constructor which doesn't initialize the field. (That's also one of the reasons I suggest that if you have a primary constructor, which defines all the fields, all the other generative constructors should forward to that one. It makes it much easier to reason about the fields.) Not saying it's not possible to come up with an example where it can be useful, but I might be fine with just requiring you to write a normal class for it then.

lrhn avatar Aug 03 '22 12:08 lrhn

These topics have been mentioned several times above, but let me refer to the latest comments:

@lrhn wrote:

Adding covariant might be reasonable. Probably not necessary for structs.

Right, that modifier has no effect on a getter, and struct instance variables do not have setters. I think it is definitely not needed in a struct.

Adding late seems unnecessary in general

In this case I don't quite agree. We could allow late final instance variables (with or without an initializer) to be declared in the normal style (in the class body), as long as there are no generative constant constructors.

The late final instance variable with an initializer would work just like a getter plus a caching mechanism (and we could use a global map plus a final _identity = Object(); instance variable to associate a struct instance with an identity and emulate that behavior anyway, so the fact that this is not stable doesn't prevent caching from being implemented).

A late final instance variable would presumably not support initialization before the body of the constructor runs (so the primary constructor syntax would not allow us to initialize this kind of variable, it must be initialized "late"). Again, we could implement the checks and the dynamic error to get this behavior anyway, verbosely, so there's not much gained by outlawing it.

eernstg avatar Aug 03 '22 17:08 eernstg

We can definitely allow class declarations to have late variable declarations in the normal way, but I don't think I'd want them in the primary constructor. Why be late when the primary way of creating the object always initializes the variable. You can still add the late variable normally, and initialize it in any other constructors.

Struct instances cannot and must not have late fields. If the late field is introduced using the primary constructor, then == and hashCode depends on it too, and will throw if called too early. That's just too big a foot-gun. It's mutability and we can't have mutability and unstable identity. If we lose identity of a struct instance before the late field is initialized, there is nothing we can do to ensure that it won't be initialized twice, on different copies of the original struct. To potentially different values.

A late final instance variable would presumably not support initialization before the body of the constructor runs

That's allowed today. class C { late final int x; C() : this.x = 42;} works.

lrhn avatar Aug 08 '22 12:08 lrhn

I don't think I'd want them in the primary constructor.

Which is the reason why I suggested that they would be

(in the class body)

and not in the argument list which turns into the primary constructor. So we agree on that.

Struct instances cannot and must not have late fields.

Ah, that's interesting! I agree, after a bit of thinking:

The basic semantics of final instance variables (late or not) ensures that they do not mutate (read such a variable twice, and you will get the same result).

However, if a struct instance is boxed, and it contains a late final v = e; which hasn't yet been evaluated, then we could unbox and box it and end up with multiple representations of v, and each of them could get its own value (along with its own side effects). This means that the boxing and unboxing steps will be potentially very significant for the observable behavior of the program.

As seen from the history of each of these copies of the struct instance you could claim that there is nothing wrong with this behavior ("you asked for it"), but it is surely going to be impossible to reason about in practice.

So I agree: We definitely cannot let late initialization occur for a struct instance after an unboxing or boxing step, and there is no reasonable way to enforce that the initialization occurs before the first such event. Hence, no late instance variables are allowed in a struct.

eernstg avatar Aug 08 '22 13:08 eernstg