language icon indicating copy to clipboard operation
language copied to clipboard

[records] parse errors when trying to annotate a record type class field

Open pq opened this issue 1 year ago • 35 comments

The following code generates what look like parser errors.

image

error: Annotation must be either a const variable reference or const constructor invocation. (invalid_annotation at [sandbox] lib/public.dart:27)
error: Variables must be declared using the keywords 'const', 'final', 'var' or a type name. (missing_const_final_var_or_type at [sandbox] lib/public.dart:28)

/fyi @jensjoha

pq avatar Sep 07 '22 17:09 pq

My first guess is that we're parsing the record type as an argument list for the annotation. Not sure whether the proposal says how we should disambiguate this case.

@munificent

bwilkerson avatar Sep 07 '22 17:09 bwilkerson

Ouch, that is an ambiguity. Not sure there is any good solution.

@foo 
(int, int) Function(int) bar;

is completely ambiguous.

There is no simple disambiguation. We can't add something to @constantName to show that it doesn't want the argument. We can't add something to (int, int) Function(int) to show that it does want the (otherwise optional) return type.

The best hacks would be:

@foo 
@DUMMY()
(int, int) Function(int) x;

or

typedef typeof<T> = T;

@foo 
typeof<(int, int) Function(int)> x;

Neither is something we can reasonably ask people to do.

lrhn avatar Sep 07 '22 18:09 lrhn

@stereotype441

bwilkerson avatar Sep 07 '22 20:09 bwilkerson

@leafpetersen

pq avatar Sep 08 '22 16:09 pq

I guess one (unfortunate) option is to disambiguate based on space. @foo(...) is an annotation constructor call, @foo (...) is an annotated record type. We'd want to be sure that dartfmt didn't change the spacing in that case though.

leafpetersen avatar Sep 08 '22 17:09 leafpetersen

Moving this over to the language repo because this is an ambiguity in the actual language feature and not a parser bug. We'll have to change the proposal to handle this. :(

munificent avatar Sep 08 '22 17:09 munificent

One idea is to put Record at the beginning of record types, like:

class C {
  @deprecated
  Record(int, int) pair;
}

@foo
Record(int, int) Function(int) bar;

I don't like the verbosity, but it's consistent with function type syntax and resolves this ambiguity. It also makes the zero-element record type Record() instead of () record, which I kind of like because the latter feels a little terse and opaque for a type annotation.

I'll think about this some more and see if I can come up with any other ideas.

So how about them types on the right?

munificent avatar Sep 08 '22 17:09 munificent

Another (probably bad) idea: We could extend the language to support named type parameters. That could arguably be useful for other user-defined generic types because (as with optional named parameters) it would let use sites omit the type parameters can be inferred/defaulted and write only the ones the user cares about.

If we had that, then record types (and in principle function types) could look like Record<int, String s>.

But given that we already have function types which use parentheses and users seem to be OK with, this feels like the wrong approach.

munificent avatar Sep 08 '22 18:09 munificent

So we can either change the record type grammar, or the annotation grammar, or add something that users can use to disambiguate in cases where there is ambiguity.

The things we can do without changing the grammar are annoyances to users, and pretty much undiscoverable (like the typeof<T> wrapper).

We could introduce <type> as grouping parentheses for types, without the leading typeof, so you can do @foo <(a, b)> bar(){} and have (a, b) be a type instead of a parameter list. Will probably cause other ambiguities instead, though.

We can change the annotation grammar. Lots of options, all of them breaking and requiring migration, but probably something that can be automated.

We could decide to disallow annotations without parentheses. It would be slightly annoying for a few annotations. Mainly @override, which should be a language feature anyway. And probably a bunch of others that I don't know because I never use them anyway.

We could change the annotation grammar entirely to use, say, the C#-style: [expression] instead of @expression. If we are changing something, we might as well change everything 😁 .

Or we can introduce some sort of delimiter you can put after annotations to separate them from the following thing. Trailing comma? Optional semicolon after annotations? (Giving people optional semicolons, just not the ones they want!)

Changing the record type grammar is less breaking because it doesn't exist yet, so we won't break any code. It's just so. darn. simple. right now, and any change will make it worse.

Options like Record(current record type) or #(current record type) disambiguate a lot. They're also verbose and somewhat unintuitive. The # is better than Record, mainly from being shorter, and it won't conflict with symbols. Both are better than using Record<...>, that looks too much like a generic thing. I'd use the same rules as for Function: if followed by ( it's parsed specially, otherwise it's just the type Record itself. Also, then the type won't look like the destructuring syntax, or the literal syntax.

lrhn avatar Sep 08 '22 18:09 lrhn

FWIW, I actually find Record(int, {String b}) to be intuitive, and I especially like the symmetry with function types because the two kinds of types have a lot of similarity.

bwilkerson avatar Sep 08 '22 18:09 bwilkerson

I think my favorite of the suggestions above is using Record(int, {String b}) to denote a record type. I really like the similarity to what we do for function types.

One other random idea: add an optional : to the end of the annotation syntax (which maybe we make required in Dart 3.0). Then the user disambiguates via the placement of the colon, e.g.:

@deprecated:
(int, int) pair;

vs.

@Deprecated('message'):
foo;

If the user leaves off the : we should assume the thing in parentheses is an annotation argument, for backwards compatibility.

stereotype441 avatar Sep 12 '22 13:09 stereotype441

FWIW, I lean towards Record(int, {String b}) for record types too and agree that the symmetry with functions is nice.

pq avatar Sep 12 '22 15:09 pq

I like the Record(actualRecordType) syntax on principle, it's consistent, readable and easy to parse, but I worry it is too verbose.

When you want to use a record type, it's in a place where you don't want the ceremony of introducing a class. Adding extra verbosity to those places makes it less convenient, and might just remove some of the lure of having records. (Not all, but some.) Or maybe I'm over-thinking it.

lrhn avatar Sep 12 '22 16:09 lrhn

Looking on the motivation in the specification, I see three reasons: (1) no behavior, (2) verbosity, and (3) coupled to a class definition.

Surprisingly, I don't see performance, i.e. I'd expect that because we don't need allocating a new object instance, and then GC it, and instead just return a few values on the stack, it is slightly faster.

I don't understand (3) as well. What is wrong with being coupled to a class name?

For (2) I think the biggest issue is not verbosity itself, but non-locality. I have to scroll down to the bottom of the file to declare a new class that I will return from this method. Not a huge work, I admit, but still. Typing Record is still more lightweight than scrolling. Although of course adds some verbosity.

And I also like the symmetry with Function.

scheglov avatar Sep 12 '22 16:09 scheglov

I'll admit, I also find Function too verbose. And I can't even typedef my way around it, because even with typedef Fn = Function, the int Function(int) syntax is a special syntactic form unrelated to the similarly named class, so the alias doesn't apply.

Compare

Verbose Concise Overly Brief?
Record(int, int) Rd(int, int) #(int, int)
int Function(int) int Fn(int) (int)->int

I think a point could be made for shorter keywords. :)

lrhn avatar Sep 13 '22 08:09 lrhn

Could you make the word Record optional perhaps? Allowing both Record(int, int) and the shorter (int, int)? The latter would probably be used in most cases, but if you needed to precede it with an annotation (or some other ambiguous case comes up down the line) or you just wanted to be more explicit, you could write out the full Record.

Jetz72 avatar Sep 13 '22 13:09 Jetz72

And I can't even typedef my way around it, because even with typedef Fn = Function, the int Function(int) syntax is a special syntactic form unrelated to the similarly named class, so the alias doesn't apply.

Could we fix this?

mateusfccp avatar Sep 13 '22 17:09 mateusfccp

I prefer the more concise syntax for records types. I agree with @lrhn that already Function is a bit verbose.

Compare:

Record(String, String) doSomething(Record(String, String) Function(Record(int, int), String Function(int)) transform) {
 return transform(((1, 2), (i) => '$i'));
}

with the current (with allowing Fn)

(String, String) doSomething((String, String) Fn((int, int), String Fn(int)) transform) {
...
}

Or:

doSomething(transform: ((int, int), int -> String) -> (String, String)) -> (String, String) {
...
}

With disallowing : for default named parameters I think @munificent hinted that would make it easier for types on the right, but that would be a hugely breaking change. Would that make it easier for optional semicolons and other requests though? :) One can hope.

In any case, I know this is a contrived example, but Record seems unnecessarily wordy. Especially for a language that has void main() rather than public static void main(String[] args).

I like @stereotype441's suggestion of adding an optional : after metadata (could even do a ;) for disambiguation, but if you really want a prefix I liked @lrhn's suggestion of #(int, int)

TimWhiting avatar Sep 13 '22 18:09 TimWhiting

Another idea that came up in this morning's language team meeting is to disambiguate the two types of annotations based on whether there's whitespace between the identifier and the (. So e.g. @foo(...) would parse as an annotation that constructs an instance of foo, whereas @foo (...) would parse as the annotation @foo followed by a record type.

Although technically breaking, this disambiguation rule would probably do the right thing in 99.9% of cases.

stereotype441 avatar Sep 14 '22 16:09 stereotype441

Summarizing discussion from this morning.

There are two broad approaches we could take: do something with the annotation syntax to disambiguate, or change the record type syntax.

On the first approach:

  • We could require (or allow) a colon after the annotation to terminate it. That is, an annotation becomes either @name: or @name(...):.
    • This would be very breaking to existing code
    • This would be noisy, people generally like the existing syntax.
  • We could always require parentheses on annotations
    • Very breaking
    • Noisy for common cases
  • We could disambiguate based on the presence/absence of return type. That is, we could say that for methods, the method declaration "greedily" eats the type, so if there is a parenthesized list before a method declaration name, it is always interpreted as a record type. - Probably very breaking - Ugly to parse - Not clear it covers all of the cases
  • We could disambiguate based on whitespace: @name( always starts an annotation constructor call, and @name ( is always an annotated record type. - This may be ugly to implement - Probably makes the grammar more rigid, harder to change in the future - Technically breaking, but in practice probably not - Code generators would have to be aware of this, but probably not an issue (they need to add whitespace to avoid accidental concatenation anyway).

On the second approach:

  • Using generic method syntax (e.g. Record<type1, {type2 name}>) was not appealing
    • Strange to have named parameters in there
  • Using Record(type1, {type2 name}) could be done.
    • Pretty broad agreement that this is verbose, nobody excited about it
    • Some subset of the team felt it's also pretty ugly, but not universal
    • Consistent with the Function case
    • Makes multiple return type method declarations much less "symmetric"
  • Using #(...) just for the type level would work, somewhat consistent with Javascript proposals which use #{ and #[
    • Harder to search for
    • A bit punctuation heavy
    • Inconsistent with the term level
  • Using the #(...) syntax uniformly for types, expressions, and patterns could work
    • More consistent, avoids some issues with parenthesized expressions

There isn't broad agreement on the team as to a direction yet, we'll keep discussing this.

leafpetersen avatar Sep 14 '22 17:09 leafpetersen

One possibility that we didn't discuss yet is using a different infix operator for the type syntax. An idea which probably doesn't work is to use , without parentheses:

  • int,int foo(int, int) => (3, 3) is a function returning a pair of integers.
  • This is probably ambiguous: is foo(int, int x) a function expecting a single parameter named x of type int, int, or a function expecting a parameter named int of type dynamic and parameter named x of type int?

The ML family of languages uses infix *.

  • int * int foo(int x) is a function returning a pair of integers
  • This seems unambiguous for the annotation case: @deprecated int * int is fine?
  • We would need to rely on precedence for things like int * int Function(int): is it a function returning a pair, or a pair with a function typed component?
    • The ML family of languages makes * bind more tightly
  • There may be other ambiguities lurking here
  • This gives no syntax for the empty or unary tuple types.
    • we could use () for empty, but we still have the same ambiguity to work around
    • We could define these as primitive named types
  • This will look very weird for the named parameters, especially if there are no positional parameters
    • int * {int x, int y}
    • {int x, int y}
    • Though arguably, the latter is a "feature": you don't need to nest the delimiters for records with no positional fields, so it's almost as if you have a separate record syntax.
    • Is this really not ambiguous somewhere?
      • [Edited to add]: Clearly we would need to be able to parenthesize optionally for nested tuples.

A variant of the above would be to use parentheses with an infix * as a delimiter:

  • (int * int) foo(int x) is a function returning a pair
  • We still have the empty tuple problem ((*)?)
  • We could use the trailing * trick for the singleton tuple: (int*)

Other delimiters we could use (with or without parentheses):

  • int & int or (int & int)
  • int ^ int or (int ^ int)

leafpetersen avatar Sep 14 '22 17:09 leafpetersen

I had hoped to stay out of this discussion, but feel I have to voice my dislike of using spaces (and lack of spaces) as disambiguation. On an objective level, we've never done anything like that before and e.g. actively chose a potential exponential (as I recall) parsing for disambiguating between null-aware bracket or a conditional. As also mentioned elsewhere here it will also invalidate assumptions for everything having to do with dart code (say for instance code generation and code formatting). On a completely subjective level I also just don't like it.

(And now that I'm already writing a comment:)

As for, e.g., Rd I'd point out that I'd probably not get through a code review with code including such a weird abbreviation.

As for having two different syntaxes so that one can use one in normal cases and the other if the normal case is ambiguous seems complete bonkers to me. To me, at least, having a single consistent syntax would be much preferable.

Also I personally don't mind the verbosity of Record. My mind would have an easier time remembering that, than some (combination of) punctuation(s) or abbreviation(s). That it becomes too long when nesting enough of them doesn't sound like a good argument to me --- with enough nesting it's going to be completely unreadable no matter what we do.

jensjoha avatar Sep 15 '22 13:09 jensjoha

The Record(...) syntax is definitely the safe and rational choice.

It's consistent with Function. It's safe in all the right ways, least likely to set us up for problems in the future, because the grammar is well known. Accepts all record types without problems.

Using significant whitespace may fix this particular problem, but it leaves us wide open to the next conflict with an untagged parenthesis occurring after something else. It locks us down with regard to what other syntax we can add. I'd be happy using the significant whitespace solution, but I don't think it's the most forward-looking choice. (Edit: And we already had another one: on (e, s) {} after a try. Raw parentheses are darn dangerous.)

The problem with Record(...) is that it doesn't bring me joy. And I know how much writing Function annoys me today. It's a constant annoyance, something abrasive that adds to my daily amount of wear and tear every single time I write it. I doesn't go away. It doesn't get better. It's just "here we go again". It's a cost of using Dart that I have to write Function when I want a function type. And I think that matters.

In that comparison, I'd prefer #(...) for the type, and not for the values, even if it takes up precious grammatical space, because it will keep records a low syntactic overhead. I won't mind writing it. It's still searchable (#( is unique). And if we can't use our syntax for something as central as the product type, what can we use it for?

lrhn avatar Sep 15 '22 20:09 lrhn

I really like the colon suggestion from @stereotype441. I think allowing it (not making it required) wouldn't be a breaking change right? The point is there needs to be some delimiter between annotations and what they're annotating. Of all the solutions presented, it feels like the cleanest.

Also this applies to annotated parameters as well. int foo(@annotation (int, int) latLong) has the same ambiguity.

ds84182 avatar Sep 15 '22 20:09 ds84182

Sometimes I feel like the parenthesis are extra unnecessary code, let's say, compared to Golang.

Example: int, int getOrigin() { return 0, 0; } vs (int, int) getOrigin() { return (0, 0); }.

However, it's not too much code and may make the code more readable.

In my opinion, having to write Record defeats the main advantage of records by requiring so much boilerplate and making the code not look good.

Wdestroier avatar Sep 15 '22 22:09 Wdestroier

As another consideration. Right now the Records feature does not break user written classes named Record because the user written declaration shadows the dart:core Record, and the record type does not need the name Record in it. If you make it mandatory to write Record as part of the type, it might make the disambiguation a little weird.

// User written
class Record {}

Record(int, int) f0(Record(int, int) value);

Record f1(Record value);

I'm assuming you would do the right thing here and resolve to the dart:core Record type for f0, but the user written class for f1, (they are disambiguated because of the parentheses, but it definitely makes it confusing to read).

TimWhiting avatar Sep 16 '22 14:09 TimWhiting

We'd have to be absolutely certain that we can distinguish whether Record(int) should be a type or a constructor invocation. We generally can do that. A lot of Dart would break down if we couldn't distinguish whether a position should be a type or an expression, since identifiers can be both. So it would probably work out. Like for Function, the following ( force-disambiguates it into a type in a type-position, but not in an expression position.

lrhn avatar Sep 16 '22 14:09 lrhn

I can't think of any true ambiguity (like you say the following ( disambiguates like Function) but there will be a lot during editing. Here are a few I can think of.

Record(int, int) f0(Record(int, int), value);
//                         ^ record parameter missing a name, or is this a function parameter named Record 
// (function parameter as written)

void main() {
  Record(int, String)
 { //  ^ is this a local function (yes), but also could be 
// an object instantiation / function call with a missing semicolon
// a variable declaration with a missing name & semicolon
// (Makes it harder to be rid of semicolons), though I guess this is an existing problem with blocks
  }
}

TimWhiting avatar Sep 16 '22 15:09 TimWhiting

Would using #( for types, but not for expressions, open up the possibility for record type literals? If we use either #( or Record do we plan to allow for record type literals?

natebosch avatar Sep 20 '22 19:09 natebosch

Would using #( for types, but not for expressions, open up the possibility for record type literals?

Yes, it would.

If we use either #( or Record do we plan to allow for record type literals?

I don't think we support Function(...) for function type type literals. I believe it's because relying on a built-in identifier (and not a true reserved word) makes that weird. So we probably wouldn't support record type literals if we use Record(...). Honestly, I think type literals are often more trouble than they're worth. I'm not inclined to double down on them because making the type annotation syntax overlap the expression syntax just paints us into a corner in a lot of ways. (Like the fact that we couldn't use class names as tear-offs for the unnamed constructor.) So I'm not inclined to make record type type literals a priority. But if we picked a syntax like #(...) which made it clearly unambiguous, we definitely could allow them.

munificent avatar Sep 20 '22 19:09 munificent