[Proposal]: Nominal and Collection Deconstruction

Open YairHalberstadt opened this issue 3 years ago • 15 comments

Nominal and Collection Deconstruction

[x] Proposed
[ ] Prototype: Not Started
[ ] Implementation: Not Started
[ ] Specification: Not Started

Summary

We allow deconstructing an instance into its constituent properties/fields in a way paralleling how property patterns can conditionally deconstruct an instance, and positional deconstruction can deconstruct instances with a suitable Deconstruct method.

We similiarly allow deconstructing a collection into its constituent elements in a way parraleling list patterns.

Motivation

It is common to want to extract a number of fields/properties from an instance. Currently this is possible to do declaratively using property patterns, but the fields/properties are only assigned when the pattern matches. This forces you to put your code within an if statement if you want to use pattern matching to declaratively extract a number of properties from an instance. In order to keep this brief I will link to a motivating example from an earlier discussion: https://github.com/dotnet/csharplang/discussions/3546.

Additionally there's an aspect of symmetry in the language (see #3107 for more on this theme):

There is currently a parralelism in two dimensions between positional data, nominal data, and collections on one axis, and declaration, construction, deconstruction, and pattern matching on the other.

You can declare types positionally using positional records/primary constructors. You can construct an instance positionally using a constructor, you can deconstruct it using positional deconstructions, you can pattern match it using a positional pattern.

You can declare types nominally through properties/fields. You can construct an instance nominally through an object initialize and you can pattern match it using property patterns.

You can construct a collection using a collection initializer, and you will likely soon be able to pattern match it using list patterns.

This proposal fills in two of the three missing squares here by introducing nominal and sequence deconstructions.

Detailed design

High level overview.

We have 3 aims which inform this design:

Make the most common cases as easy as possible.
Maintain symmetry with existing constructs (positional deconstructions and patterns).
Don't block ourselves from making enhancements in future language versions.

The most common case is to simply want to declare a bunch of variables. Here we take a cue from positional deconstruction, which allow you to preface a deconstruction with var to automatically declare locals for all identifiers within the deconstruction:

var {
    Start: { Line: startLine, Column: startColumn },
    End: { Line: endLine, Column: endColumn },
} = textRange;

This declares 4 variables, startLine, startColumn, endLine, endColumn.

Positional deconstruction also allows you to specify the type explicitly, and assign to arbitrary lValues, so we allow that by leaving off the var:

{
    Start: { Line: long startLine, Column: C.Column },
    End: var { Line: endLine, Column: endColumn },
} = textRange;

Patterns can contain any arbitrary pattern so we allow nesting any deconstruction in any other, e.g:

var ({ A: [ a, b, c] }, d) = (new { A = new[]{1, 2, 3} }, 4);

Patterns can assign a pattern to a variable, even if the pattern itself contains other nested patterns, so we allow that:

var {
    Start: { Line: startLine, Column: startColumn } start,
    End: { Line: endLine, Column: endColumn } end,
} = textRange;

It's useful to be able to assign such a variable to an existing local, like so:

TextPoint start;
{ Start: { ... } start } = textRange;

On the other hand, we want to be able to declare a new local. We can't do so by putting var beforehand, since that makes all nested identifiers declare new locals. We don't want to do so by putting an explicit type beforehand, since that would lead to a confusing difference between var and other types. Instead we say that {} identifier declares a new local if one does not exist, and otherwise assigns to the existing local. This is very different to how C# works so far and may be reconsidered.

We apply all these principles to positional and collection deconstructions as well, so the grammar and spec for the 3 deconstructions is very similiar.

Unlike patterns, deconstruction does no checking for null, or bounds checking, and will throw a NullReferenceException or a IndexOutOfRangeException if these are violated. As ever, the compiler will warn you if you deconstruct a maybe null reference.

Changes to grammar


statement
    : ...
    | declaration_statement
    ;

declaration_statement
    : declaration_target '=' expression ';'
    | type single_variable_designation ('=' expression)? (',' single_variable_designation ('=' expression)?)* ';'
    ;

foreach_statement
    : ...
    | 'foreach' '(' declaration 'in' expression ')' embedded_statement
    ;

declaration
  : 'var' var_variable_designation
  | type single_variable_designation
  ;

variable_designation
  : var_variable_designation
  | single_variable_designation
  | discard_designation
  ;
  
var_variable_designation
  : parenthesized_variable_designation
  | nominal_variable_designation // new
  | sequence_variable_designation // new
  ;

parenthesized_variable_designation
  : '(' variable_designation ',' variable_designation (',' variable_designation)+ ')' identifier?
  ;

nominal_variable_designation
  : '{' named_variable_designation (',' named_variable_designation)* ','? '}' identifier?
  ;
  
sequence_variable_designation
  : '[' variable_designation (',' variable_designation)* ']' identifier?
  ;

named_variable_designation
  : identifier ':' variable_designation
  ;

single_variable_designation
  : identifier
  ;

discard_designation
  : '_'
  ;
  
declaration_target
  : declaration
  | deconstruction
  ;
  
deconstruction
  : positional_deconstruction
  | nominal_deconstruction // new
  | sequence_deconstruction // new
  ;
  
declaration_target_or_expression
  : declaration_target
  | expression // see https://github.com/dotnet/roslyn/blob/fbf1583ed659db06e903d877b35c3cbd45eb7e1d/src/Compilers/CSharp/Portable/Generated/CSharp.Generated.g4#L685 for complete list
  ; 
  
positional_deconstruction
  : '(' declaration_target_or_expression ',' declaration_target_or_expression (',' declaration_target_or_expression)+ ')' identifier?
  ;
  
nominal_deconstruction
  : '{' nominal_deconstruction_element (',' nominal_deconstruction_element )*, ','? '}' identifier?
  ;
  
sequence_deconstruction
  : '[' declaration_target_or_expression (',' declaration_target_or_expression)* ']' identifier?
  ;
  
nominal_deconstruction_element 
  : identifier ':' declaration_target_or_expression
  ;

Examples:

// Short-hand deconstruction
var (x, y) = e;   
var [x, y] = e;      
var { A: a } = e;      

// Recursive deconstruction
(var x, var y) = e;
[var x, var y] = e;
{ A: var a } = e;

// Bind to an existing l-value
(x, y) = e;
[x, y] = e;
{ A: a } = e;

Detailed Spec

`variable_designation`

A var_variable_designation is lowered recursively as follows:

Every var_variable_designation has a unique target t, which is a temporary variable of type T inferred from the expression that is assigned to t. If the var_variable_designation is the top level var_variable_designation in a declaration_statement we assign expression to t. If the var_variable_designation is the top level var_variable_designation in a foreach_statement we assign enumerator.Current to t. Else t is defined recursively below.
If a var_variable_designation defines an identifier i, we declare a local of type T? and name i and the same scope as the scope of the declaration_statement/foreach_statement, and assign t to i.
Assuming the var_variable_designation has n child variable_designations v0 to vn - 1, we produce a set of child temps t0 to tn - 1 as follows.
1. If the var_variable_designation is a parenthesized_variable_designation we look for a suitable deconstructor on T to deconstruct t into t0 to tn - 1. See the spec for more details.
2. If the var_variable_designation is a nominal_variable_designation, for each named_variable_designation with identifier ix, t must have an accessible property or field ix, and we assign t.ix to tx (this should match the spec for property patterns).
3. If the var_variable_designation is a sequence_variable_designation, t must have an indexer accepting a single parameter of type int, and we assign t[x] to tx (this should match and keep up to date with spec for collection patterns, e.g. we may allow use of GetEnumerator here).
For each child variable_designation vx
1. If vx is a var_variable_designation we lower vx as specified here, using tx as t for vx.
2. If vx is single_variable_designation with identifier ix we declare a local of type Tx? and name ix and the same scope as the scope of the declaration_statement/foreach_statement, and assign tx to ix.
3. If vx is a discard_designation we do nothing.

`deconstruction`

A deconstruction is lowered recursively as follows:

Every deconstruction has a unique target t, which is a temporary variable of type T inferred from the expression that is assigned to t. If the deconstruction is the top level deconstruction in a declaration_statement we assign expression to t. Else t is defined recursively below.
If a deconstruction defines an identifier i
1. If there is a local in scope with name i we assign t to i.
2. Else we declare a local of type T? and name i and the same scope as the scope of the declaration_statement, and assign t to i.
Assuming the deconstruction has n child declaration_target_or_expressions d0 to dn - 1: If this is a top level deconstruction: For each declaration_target_or_expression dx
1. If dx is an expression, it must be a valid lValue as defined by the spec, and we evaluate as much of dx as is evaluated before the RHS of an assignment operator as defined by the spec. The result of this evaluation is stored in a temp dtx.
2. If dx is a deconstruction we perform this step recursively to evaluate as much of it's child expressions as are necessary.
We produce a set of child temps t0 to tn - 1 as follows.
1. If deconstruction is a positional_deconstruction we look for a suitable deconstructor on T to deconstruct t into t0 to tn - 1. See the spec for more details.
2. If the deconstruction is a nominal_deconstruction, for each nominal_deconstruction_element with identifier ix, t must have an accessible property or field ix, and we assign t.ix to tx (this should match the spec for property patterns).
3. If the deconstruction is a sequence_deconstruction, t must have an indexer accepting a single parameter of type int, and we assign t[x] to tx (this should match and keep up to date with spec for collection patterns, e.g. we may allow use of GetEnumerator here).
For each child declaration_target_or_expression dx
1. If dx is an expression we assign tx to dtx as specified by the spec on simple assignment. The assignment must be valid according to the rules specified there.
2. If dx is a declaration
  1. If the declaration is a var_variable_designation we lower vx as specified above, using tx as t for vx.
  2. If the declaration is a single_variable_designation with identifier ix and type Tx we declare a local of type Tx`` and name ixand the same scope as the scope of thedeclaration_statement, and assign txtoix. If typeisvar Txis inferred fromtx`.
  3. If the declaration is a discard_designation we do nothing.
3. If dx is a deconstruction we lower dx as specified here, using tx as t for dx.

Drawbacks

This is a significant set of enhancements to deconstruction. Deconstruction is far less common than pattern matching, so it may be that the benefit from this set of enhancements is not considered sufficient to pay for itself.

Parsing ambiguities

In order to distinguish between a nominal_deconstruction and a block, we need to parse till we reach a , a ; or the closing brace (at which point we can check if it's followed by a = or not). This lookahead may be expensive. However much of the parsed syntax tree can be reused between the two cases.

In order to distinguish between a positional_attribute and an attribute on a local function we need to parse till we reach the closing ] and check to see if it's followed by a = or not. This may also be expensive, although I imagine the most expensive cases will quickly run into something that will disambiguate them, such as expressions that are disallowed in attributes.

If expression blocks are added in the future, this may possibly lead to genuine ambiguities even at a semantic level. E.g { P : (condition ? ref a : ref b) } = e; could be a nominal deconstruction, or an assignment to an expression block containing the label P. It shouldn't be too difficult to work around this (e.g. disallow labels for final expression of an expression block).

Alternatives

There are a number of simplifications to this spec we could consider:

Only allow the var form of the patterns as the most common.
Don't allow mixing the different forms of deconstruction.
Don't allow declaring a local as well as a deconstruction. etc.

As well there's a lot of axis on which the exact grammar/semantics could be adjusted. I hope I made clear in my high level overview why I made the decisions I did, but I will not be surprised if others come to different conclusions.

Unresolved questions

How do we modify the spec I've given above to allow target typing of literals in the case of tuple deconstruction.

Design meetings

https://github.com/dotnet/csharplang/blob/master/meetings/2020/LDM-2020-11-16.md#nominal-and-collection-deconstruction

Oct 28 '20 20:10 YairHalberstadt

IMO I'd rather build deconstruction on top of pattern matching grammar rather than trying to mirror it. I get that you're trying to achieve a parity with tuple deconstruction, but to me it just seems to make everything significantly more complicated.

I'd be happy if deconstruction was nothing more than:

pattern = relational_expression;

And the compiler would warn if the single match was not exhaustive. Maybe error if there are no pattern variables.

I know that you've had a long discussion about this on the Discord channel so please pardon me for being a bit naive about it. Like most things it's probably a lot more complicated than it seems it should be on the surface. 😄

Oct 28 '20 20:10 HaloFour

If you want to factor out the "shorthand" syntax beyond the existing positional deconstruction, that will probably work with a primary pattern on the left, we just require l-values instead of constants which I agree is a good idea - otherwise all of this is necessary.

Also, we shouldn't check for "exhaustiveness", we just skip the checks. For example var (x, y) can be non-exhaustive, but we permit it as a deconstruction anyways.

Oct 28 '20 22:10 alrz

@alrz

For example var (x, y) can be non-exhaustive, but we permit it as a deconstruction anyways.

How is that deconstruction non-exhaustive today? I thought it only worked if the expression was a tuple known to be two elements, or the expression had a compatible Deconstruct method with an arity of two elements?

Oct 28 '20 22:10 HaloFour

Tuple<int, int> t = null;
var (x, y) = t; // throw
if (t is var (x, y)) // false

Oct 28 '20 22:10 alrz

Thanks for the very detailed write up Yair. I'm going to champion this as I'm interested in deconstruction assignment moving forward, I'll read through the proposed spec at a later point.

Oct 28 '20 22:10 333fred

Ah yeah, due to Deconstruct extension methods being unable to convey success it's assumed that they're always successful. This would be the case with any deconstructable reference type. With #nullable enable you at least get a warning about the deconstruction, though.

Oct 28 '20 22:10 HaloFour

If you only exclude null checks, sure. But that will make it impossible to introduce list deconstruction as it always needs to match a specific length.

Oct 28 '20 22:10 alrz

I'm much less interested in how they're implemented and the exact specifications than I am that they feel natural and intuitive with pattern matching and that one can take a match with a single case with pattern variables and convert it quickly and easily into a deconstruction. List deconstruction shouldn't feel different from list matching, etc. But aside the nitpickiness I love the idea and hope it is added to the language. 🥳

Oct 28 '20 22:10 HaloFour

A quick explanation as to why the spec can't be as simple as: "try to pattern match this, and throw a DeconstructionException if the pattern returns false".

There are a number of differences between pattern matching and deconstruction that makes this impossible.

Deconstruction does not check for null. It lowers the deconstruction into a set of membercalls, which will throw a NullReferenceException if any of the targets is null, just like any other code.

E.g.

using System;
public class C {
    public int M() {
        var(x, (y, z)) = (1, (C)null);
        return x + y + z;
    }
    void Deconstruct(out int a, out int b) => throw null;
}

Is lowered to

    public int M()
    {
        int a;
        int b;
        ((C)null).Deconstruct(out a, out b);
        int num = a;
        int num2 = b;
        return 1 + num + num2;
    }

Deconstruction allows implicit casting, whereas pattern matching does not.

E.g. This will compile:

public class C {
    public void M() {
        (long a, int b) = ((int)1, new C());
    }
    public static implicit operator int(C c) => 42;
}

But the equivalent match ((int)1, new C()) is (long, int) would return false.

Pattern matching will fail if a variable is null, whereas this is perfectly ok in a deconstruction.

E.g.

(object a, object b) = (null, null);

Will successfully assign null to a and b, but the pattern ((object)null, (object)null) is (object, object)` would return false.

Oct 29 '20 15:10 YairHalberstadt

Adding a new way to get NullReferenceException seems like a bad idea.

Dec 16 '20 23:12 gafter

@gafter you would still get NRT warnings, just like you would today when you would have to deconstruct it manually.

Dec 16 '20 23:12 YairHalberstadt

Excellent proposal that deals with every scenario I can think of.

Apr 02 '21 13:04 chrizpro

That is really nice, I think this could be extended for methods and lambda arguments too.

public record Foo(int Bar, string Baz);

// So this
public DoSomething(Foo foo) => Console.WriteLine($"{foo.Bar} and {foo.Baz}"); 

// could be this
public DoSomething(Foo { Bar: bar, Baz: baz }) => Console.WriteLine($"{bar} and {baz}"); 
//or this
public DoSomething(Foo (bar, baz)) => Console.WriteLine($"{bar} and {baz}");

Jul 30 '21 02:07 lucasteles

Seems like a ton of new syntax to make the rare occasions you need to do something like this marginally more terse (and possibly unclear to the 99.9% of C# devs who will never use the new syntax).

Jun 08 '22 14:06 MgSam

Have a discussion about something that relates to this, mainly for function arguments, but it could be great to think of both as one

https://github.com/dotnet/csharplang/discussions/6055

Jun 09 '22 23:06 lucasteles

I think this could be kinda nice, if it ever goes forward. I just enabled C# 11 in a small new project and, aware of the new list patterns but without thinking very much about the nuances of pattern-matching v deconstruction, typed this line of code:

var [A, B, C, D] = solve(m);

where solve returns a 4-element array. It makes sense to me why this isn't a feature (yet?), since this code asserts (instead of tests) the shape of the array and the compiler doesn't really know much about the array beyond its rank, unlike with tuple or other Deconstruction.

But anyway, I used the following as an alternative:

var ABCD = solve(m);
var (A, B, C, D) = (ABCD[0], ABCD[1], ABCD[2], ABCD[3]);

Dec 07 '22 16:12 rummelsworth

@rummelsworth You can use the list pattern to do what you want with the added benefit of verifying the list's size:

var ABCD = new[] { "A", "B", "C", "D" };
if (ABCD is [var A, var B, var C, var D])
{
	// Do alphabet stuff...
	Console.WriteLine(A+B+C+D);
}

HTH

Dec 09 '22 08:12 mrwensveen

Thanks @mrwensveen, I tried this. Unfortunately, my actual code doesn't really make sense using an if just to deconstruct the array. I would need to add an else that throws an exception to achieve what I want (or rather, what the algorithm requires), and it ends up being somewhat obfuscating. Nonetheless, thanks for the suggestion. And technically, the list pattern might be a tiny bit faster. Here's the output of a benchmark I threw together:

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19045.2251)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2


|      Method |     Mean |     Error |    StdDev |
|------------ |---------:|----------:|----------:|
| ListPattern | 7.338 ns | 0.1102 ns | 0.0977 ns |
| ArrayAccess | 8.247 ns | 0.1912 ns | 0.1788 ns |

Dec 10 '22 03:12 rummelsworth

I like the idea of expanding deconstruction in such a way, but what I really find troublesome is the expansion of the expression in the LHS. In the provided example:

var {
    Start: { Line: startLine, Column: startColumn },
    End: { Line: endLine, Column: endColumn },
} = textRange;

we have a syntactically massive LHS, with the RHS only being a simple identifier expression. To me, this looks somewhat unorthodoxical. What I believe would be nicer to have is to use another way to declare the deconstruction, in a more fluent-like syntax:

textRange into var
{
    Start: ...
    End: ...
};

The into keyword would play the role of declaring the deconstruction using pattern matching, and not the traditional deconstruction as provided by declared Deconstruct methods. Currently, a similar syntax achieves this goal, using pattern matching in an is expression, but discarding the result of the evaluation. This would be equivalent to (if not mistaken):

_ = textRange is {
    Start: { Line: var startLine, Column: var startColumn },
    End: { Line: var endLine, Column: var endColumn },
};

Another approach can be to keep the LHS/RHS arrangement, but replace the = with another keyword like from. Similar to the above pattern, the only change is that the arrangement is inverted into this:

var {
    Start: { Line: var startLine, Column: var startColumn },
    End: { Line: var endLine, Column: var endColumn },
}
from textRange;

The only problem with this approach is that it might make parsing slightly more difficult by having to handle the from keyword in a non-LINQ context.

May 02 '23 14:05 Rekkonnect

@Rekkonnect

This proposal is intentionally attempting to keep a symmetry with pattern matching syntax, and with the existing positional deconstruction.

Already legal:

// pattern matching
if (obj is (var name, var age)) { ... }

// deconstruction
var (name, age) = obj;

May 02 '23 14:05 HaloFour

csharplang csharplang copied to clipboard

[Proposal]: Nominal and Collection Deconstruction

Nominal and Collection Deconstruction

Summary

Motivation

Detailed design

High level overview.

Changes to grammar

Detailed Spec

variable_designation

deconstruction

Drawbacks

Parsing ambiguities

Alternatives

Unresolved questions

Design meetings

csharplang
csharplang copied to clipboard

`variable_designation`

`deconstruction`