language Records: Should there be a field access notation for positional fields?

Records: Should there be a field access notation for positional fields?

Open leafpetersen opened this issue 1 year ago • 6 comments

The current record proposal provides no notation for accessing a single positional field:

Positional fields are not exposed as getters. Record patterns in pattern matching can be used to access a record's positional fields.

There are a couple of implications of this.

First, I think this means that there's significant asymmetry between named and positional fields, in that you can access a named field directly on the object, but you can't do so for a positional field. So you have a constant sized syntax for reading a single named field, but the only syntax for reading a positional field requires reading all of the fields (out into a pattern match) which is fairly verbose (even if you just use _ for all of the fields you don't care about).

Second, there's a semantic asymmetry in that you can presumably access named fields dynamically (and hence write code that is polymorphic over records with named fields) but not positional fields:

void printX(Record record) {
  print(record as dynamic.x);
}

Should we provide a way to read a single positional field? e.g. record.0 etc?

cc @munificent @lrhn @eernstg @stereotype441 @natebosch @jakemac53

Aug 05 '22 23:08 leafpetersen

I would like to come up with a positional field accessor expression syntax, yes. I don't have a design yet and I don't think it's essential so the current proposal just says there isn't one.

The proposal initially said each one got a named getter like field0, field1, etc. But that runs into annoying/dumb problems around what if you try to have a named field with that same name? So to keep things simpler and avoid coming up with some solution for those collisions, I just took positional field getters out.

Aug 05 '22 23:08 munificent

Crazy random idea: what if the syntax for positional field accessors is simply the [] operator applied to an integer literal?

E.g.:

(int, String) a = (3, 'foo');
var b = x[0]; // b has static type `int`
var c = x[1]; // c has static type `String`
var d = x[2]; // Static error: no operator [2] in (int, String)
var e = x[0 + 1]; // Static error: no operator[] in (int, String)

The CFE could desugar accesses like x[0] into property gets using a name that would otherwise be invalid (e.g. %field0), so there's no possibility of conflict with existing fields.

Probably a terrible idea, but just thought I'd throw it out there :)

Aug 07 '22 02:08 stereotype441

@stereotype441 I think the positional access syntax should work when the receiver has static type dynamic to exactly the same degree as it will for named fields.

If we decide that we must implement toString to show the names of the fields then we have enough metadata to implement a read-only asMap() view of the record, so we could implement a general indexer, and use your suggestion of special static typing rules for a literal or constant index value when the receiver is a known record type.

However, I would rather not have that metadata in the compiled program. Unlike function types, the names are not needed for type checks (the subtype needing a superset of the named parameters).

Aug 07 '22 20:08 rakudrama

I tend to prefer @munificent's first proposal (using names like field0 .. fieldN for the positional components, although we might of course use a different specific name than field...). There could be name clashes, but that's not a breaking change because there are no records now. I'm sure we can think of a naming scheme which is reasonably readable, and unlikely to clash with names that developers actually want to use with named components.

This allows the mechanism to be consistent and convenient:

With respect to parsing, and comparing with the alternative myRecord.0: We wouldn't need to handle special member names like 0, 1, ... in the parser. For instance, could 1.5 be an attempt to look up a positional record component in 1? What if we introduce implicit constructor invocations that allow us to turn 1 into a record?

Comparing with myRecord[0]: In a dynamic invocation myDynamic[someExpression], would we enforce the constraint that someExpression must be an integer literal, or at least a constant expression? If we do support iterating over all positional components of a record using try ... (myRecord as dynamic)[i++] ... catch, shouldn't we also support iterating over all named components? Why not all named members of instances of classes? ;-)

With respect to the practical value of accessing positional components using normal getters: Developers can use r.field2 in the middle of an expression. It might be quite inconvenient to have to introduce a pattern matching construct at that point.

Dynamic invocations: It seems likely that we can support dynamic invocations, even if the runtime uses a more compact representation for positional components and their names than they do for named components.

There would be other corner cases, for example: It would probably not be possible to introduce a non-trivial noSuchMethod of a record type, but the one in Object would at least have a meaningful memberName to print (comparing again: #0 is not a symbol).

Aug 08 '22 08:08 eernstg

I've suggested record[0] before (can't find where). It works, and the main issue is that it looks like the index operator, but is actually a special record syntax which requires a constant operand (doesn't have to be a literal, any constant will work).

Last I was discussing this, I gravitated towards liking .0 better. It's not syntax which otherwise exists (except as part of a double literal, 1.0) and the lexer handles that ambiguity already - 1.0 is a double literal, if you don't want it to be a single token, you need parentheses (or maybe just spaces, that could get ugly?).

var r = (42, 37, foo: 87);
print("(${r.0}, ${r.1}, ${r.foo})");

would work.

We can make it work for dynamic invocations too, it'll just fail if the target is not a record with that many positional elements. Since the grammar is specific to record member access, we won't be introducing a way to do dynamic record lookup by going through dynamic, like [0] would: Record r = ...; var nth = (r as dynamic)[n];.

That too is a reason for me to not allow [0]. I do not want runtime-introspection. If a compiler recognizes that nobody every uses the .1 field of a (int, int, int, foo: int) record, it should be allowed to optimize it away. Even doing (o as dynamic).1 will probably void that optimizaton. Doing (o as dynamic)[n] is much more likely to happen, e.g. when people are parsing JSON). That's also another reason I don't want toString to be clever, and would prefer it just returning Instance of Record. Having it include all fields means not being able to optimize fields away!

I wouldn't make (o as dynamic).0 hit noSuchMethod on o, since .0 is not an object member at all. (But then, I'd be fine with not hitting noSuchMethod for any member which isn't part of the interface of o to begin with.)

(We can even, in some hypothetical future, choose to allow classes to declare positional members, named 0, 1, etc., if we want to. They must be consecutive and start at 0. I don't have a syntax ... yet!)

Aug 08 '22 09:08 lrhn

I'm inclined to agree with the analysis from @lrhn above.

Aug 09 '22 20:08 leafpetersen

I tend to prefer @munificent's first proposal (using names like field0 .. fieldN for the positional components... I'm sure we can think of a naming scheme which is reasonably readable, and unlikely to clash with names that developers actually want to use with named components.

I think reserving names like field0 is a good thing because if someone starts writing named fields that are essentially just "1, 2, 3...", then those fields don't need to be named -- they might as well be positional. Users would (probably) get a warning saying their field names are conflicting with the positional identifiers and they'd be able to think about whether they really need named fields. Then if they really want, they can change to something else like myFirstField.

Then you get all the benefits of myRecord.0 without the unusual grammar. Even the dynamic lookup is no different than doing (o as dynamic).namedField.

Aug 10 '22 16:08 Levi-Lesches

We discussed this in the language meeting today. We agree that some expression syntax for accessing positional fields is important. (In particular, I find Leaf's point that without an expression syntax, accessing the nth field requires a pattern of at least n subpatterns, which can be very verbose to be compelling.)

We haven't settled on a syntax. A few options we're considering (most already mentioned here):

`record.0`, `record.1`, etc.

This would be a new syntax. We'd treat each of these like separate operators and not a single "positional field" operator that takes an index as an argument since we need separate return types for each field. Lexically, we'd treat . and the integer as separate tokens, but the parser would treat them as a single conceptual unit.

The . and a receiver before it are both required. Inside an extension on a record type, you could not simply use 0 as an implicit self send to access the zero-th field!

Pros:

It's extremely terse, basically as short as you can get.
It can't collide with any named field names.

Cons:

It's new syntax, which is always fairly costly in terms of complexity and implementation effort.
It makes it harder to ever support any future syntax that allows multiple adjacent expressions, since that would now become ambiguous with an identifier expression followed by a double literal. (Adjacent expressions are already hard to support because -, [, and ( all have both prefix and infix expression forms.)

`record[0]`, `record[1]`, etc.

In other words, reuse the existing subscript operator syntax. But, in order to handle the heterogeneous types of the fields, we require the index to be an integer value known at compile time.

Pros:

No new syntax.
Can allow constant expressions to refer to field indexes in addition to integer literals.

Cons:

Potentially confusing to users that the index expression must be a constant expression.

`record.field0`, `record.field1`, etc.

Just come up with some prefix like field.

Pros:

No new syntax or static semantics. It's just auto-generated getters.

Cons:

field is pretty verbose.
Have to deal with collisions with named fields using the same name. For records themselves, this isn't really a problem—just don't do that. But if/when we want to be able to spread records to argument lists, we may encounter parameter lists that have named parameters that do collide with these.

`record.$0`, `record.$1`, etc.

Like the previous suggestion but using $ as the prefix, which is already a valid Dart identifier.

Pros:

No new syntax or static semantics. It's just auto-generated getters.
Shorter than field.

Cons:

Could still technically collide, though the odds of their being named parameters named $0, etc. is quite slim.
Looks weird in string interpolations. Though you would almost always be using the braced form of interpolation anyway, since the record you're accessing the field on is unlikely to be this. "some string ${record.$0}" isn't that hard to read.

Still an open discussion.

Aug 17 '22 22:08 munificent

A topic we haven't discussed yet that I think could affect this decision is code that is polymorphic over tuple arity. Right now, there's no plan to be able to write code that works with a record type and is generic over how many positional fields the record has.

I suspect that kind of use case will come up. For example, one of the approaches to handle awaiting records (#2321) is defining a set of core library functions like:

Future<(T1, T2)> wait2<T1, T2>(
    Future<T1> future1, Future<T2> future2) {
  ...
}

Future<(T1, T2, T3)> wait3<T1, T2, T3>(
    Future<T1> future1, Future<T2> future2, Future<T3> future3) {
  ...
}

Having to define separate functions for each arity up to some arbitrary maximum is pretty tedious. C++ introduced a notion of parameter pack to allow templates to write code that can be more flexibly generic over this kind of boilerplate.

We could probably tackle this in Dart just using macros. But if we more graceful support for this kind of code, we might want to support variadic generics and a way to build records and record types out of the corresponding type parameter lists.

If we do that, then the code working with those generic records may need to access positional fields in an abstracted way. I think the record[n] syntax could handle that fairly gracefully, but a syntax that bakes integer literals into identifiers less so.

I'm not sure if this is an important constraint (there are many many open questions of how variadic generics would work), but I wanted to put it out there.

Aug 17 '22 22:08 munificent

In https://github.com/dart-lang/language/issues/2388#issuecomment-1207909198 a syntax allowing runtime introspection is listed as a con.

In https://github.com/dart-lang/language/issues/2388#issuecomment-1218558975 runtime introspection is listed as a potential future enhancement.

Do we need to separately figure out where we land on this before choosing a syntax?

Aug 17 '22 23:08 natebosch

In #2388 (comment) runtime introspection is listed as a potential future enhancement.

In that comment, I'm not necessarily assuming that some kind of variadic generics would rely on runtime introspection. I would definitely prefer that it get expanded statically at compile time, though that certainly raises lots of questions. It may be that the right answer is to lean on macros for this.

Do we need to separately figure out where we land on this before choosing a syntax?

Not necessarily. I think we can pick whatever syntax we need for this and it won't entirely paint us into a corner if we later want to be able to write code that's polymorphic over record arity.

Aug 17 '22 23:08 munificent

I personally like $0, it indicates "zero" while being clear it's a language-provided construct, whereas field0 looks like a human-written getter.

Inside an extension on a record type, you could not simply use 0 as an implicit self send to access the zeroth field!

It would also look a little weird to see this, but I can see people getting used to it.

extension on (num, num) {
  double get distanceToOrigin => math.sqrt($0**2 + $1**2);
}

Aug 18 '22 00:08 Levi-Lesches

All the suggested syntaxes work as selectors, so they can be used with null-aware access (r?.0, r?[0], r?.$0) and cascades (r..0.action()..1.action(), etc.). That's good.

The .0 can have a parsing problem if chained: r.0.0 will tokenize as "identifier, dot, double-literal". We can probably work around that (special casing the tokenization of a double literal after a .), but it's an extra complication. If you do dynamic r = (1, 2) as dynamic; print(r.0);, should it work? It can, and for consistency, it probably should. If you then do dynamic r = MyClass() as dynamic; print(r.0);, what should happen? Should it call MyClass.noSuchMethod, or just fail like print(!r) would, being another non-overridable opertator not supported on the value? (Not entirely the same, !r fails because ! introduces a bool context, and the implict downcast to bool fails. There is no implicit downcast to a specific record type for r.0, and records with at least one positional element do not have a shared supertype which supports .0.)

The r[0] syntax interacts badly with dynamic access. On a record, it's a special operator, like .0, and must have a constant number. If you do dynamic r = (1, 2) as dynamic; print(r[0]);, should it work, or should it fail to find operator[]? The record operation is not the index operator (operator[]) of an object, the typing is different, instead it's a special operator per index value, so the dynamic invocation will likely fail. What if it was print(r[fib(1)]), a non-constant value? I'd say that must not work, because otherwise we've introduced functionality that's only available through dynamic invocations.

The $0/fiekd0 names work with dynamic invocations. They're not special in any way. You can inspect a record to find its number of positional elements (up to a limit set by your source) by trying to dynamically read $0, $1, ... $999 until it throws. That's OK. It won't tell you the named elements, and if you know it's a tuple (no named elements), you could just try casting to (Object?, Object?, ..., Object?) instead and see if that worked. This is definitely the solution with the least amount of new moving parts. Not the prettiest, but likely completely serviceable.

Aug 18 '22 07:08 lrhn

The .0 can have a parsing problem if chained: r.0.0 will tokenize as "identifier, dot, double-literal". We can probably work around that (special casing the tokenization of a double literal after a .), but it's an extra complication.

In this case the following code became possible

extension on double {
  Record call() {
    return (foo, (("", 3), 3.14));
  }
}

extension on Record {
  double operator* (double other) => 3.14;
}

void foo() {
  print("foo");
}

main() {
  3.14().0();
  3.14().1.0 * 1.1;
  1.1 * 3.14().1.1;
}

Aug 18 '22 10:08 sgrekhov

True, and if we then extend .0 member access to classes, say as operator 0 () => ..., then It'll probably be only seconds before we see things like:

 var ipv4 = ip.192.168.0.1;

var time = T.11.13.25.pm;

or similar shenanigans.

Let's ... not do that then.

(Not an entirely new possibility, it just looks bettern than T(11)(13)(25).pm or T[11][13][25].pm.)

(I probably wouldn't allow .0 on something of type Record, it needs a real record type which guarantees that the 0 field exists, but

extension on (Object?, ((String, int), double)) {
  ...
}

should work)

Aug 18 '22 11:08 lrhn

I like $0

Aug 18 '22 21:08 natebosch

OK, the parsing and readability problems of .0 have convinced me that's the wrong path.

Another problem with [] is that it could collide with the normal subscript operator:

extension RecordSubscript on (int, int) {
  int operator [](int index) => 3;
}

main() {
  (1, 2)[0];
}

Should this print 1 or 3? Is it an error to define a [] operator on a record type? What if you define it as an extension on a supertype of records?

It seems like no one likes field0. That leaves $0. I'm OK with it. If no one complains (@leafpetersen @eernstg @jakemac53 @stereotype441 @kallentu), I'll add that to the proposal.

Aug 18 '22 22:08 munificent

Would it be possible to use the name of a positional field if a name is provided in the type?

(int x, int y) position = (5, 10);
print(position.x);

Aug 19 '22 00:08 mmcdon20

I believe the point of having positional fields is to deliberately not expose the names of the fields in an API, similar to how you can't pass positional arguments by their names in function calls.

Aug 19 '22 02:08 Levi-Lesches

I believe the point of having positional fields is to deliberately not expose the names of the fields in an API, similar to how you can't pass positional arguments by their names in function calls.

If you have a record with type (int x, int y) you would still create the record by passing in the values according to position rather than by name.

But once those arguments are passed in, you then refer to positional arguments by their names within the definitions of functions. Their names do not effect the signature of the function but having a name to refer to them by is more convenient than referring to them by their position.

I don't think using the names for positional fields of records in this way would be that much different. The names would not affect the record's shape, and you would be able to provide whatever names you want for the positional fields.

(int latitude, int longitude) getPosition() {
  return (5, 10);
}
print(getPosition().latitude); // okay

(int x, int y) position = getPosition(); // providing different names for positional fields is okay
print(position.x); // okay
print(position.latitude); // error

print(position == getPosition()); // true

There is the downside that changing the name of a positional field would be a breaking change. Not sure if there are other downsides/potential issues that I am missing.

Aug 19 '22 04:08 mmcdon20

No complaints, go with $0 etc.

We could, if we wanted to, defined $0 as a magical extension method. We could do that for named fields too.

That is, we could act as if the platform libraries exposed an infinite set of unnameable and unhidable extensions, one for each record type (tree-shaken to one for each record-shape which exists in the program), so that for (_, x: _) we have:

extension $$CantTouchThis$$<R, T> on (R, x: T) {
  R get $0 => switch (this) { case (it, x: _) => it; case _ => throw WatError("unreachable"); };
  T get x => switch (this) { case (_, x: it) => it; case _ => throw WatError("unreachable"); };
}

(But much more efficient, obviously!)

Then imperative record destructuring will only work at the static type of a record. You wouldn't be able to get to record fields through dynamic dispatch.

I think the only advantage of doing so, is that we won't complicate dynamic invocations any further, and require retaining run-time information needed to perform such dynamic field gets.

Aug 19 '22 12:08 lrhn

Would it be possible to use the name of a positional field if a name is provided in the type?

No, the positional field name is not part of the record's type. You could have multiple positional records that have different names for a given positional field and all are considered to have the same type and are freely assignable to each other. That means there's no reliable way to know which position a given field name should correspond to.

Aug 19 '22 18:08 munificent

You wouldn't be able to get to record fields through dynamic dispatch.

I'd consider this a positive feature.

Aug 19 '22 22:08 natebosch

language language copied to clipboard

Records: Should there be a field access notation for positional fields?

record.0, record.1, etc.

record[0], record[1], etc.

record.field0, record.field1, etc.

record.$0, record.$1, etc.

language
language copied to clipboard

`record.0`, `record.1`, etc.

`record[0]`, `record[1]`, etc.

`record.field0`, `record.field1`, etc.

`record.$0`, `record.$1`, etc.