csharplang icon indicating copy to clipboard operation
csharplang copied to clipboard

[Proposal]: Collection expressions

Open CyrusNajmabadi opened this issue 2 years ago • 490 comments

Collection expressions

  • [x] Proposed
  • [ ] Specification: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-12.0/collection-expressions.md
  • [x] Prototype: Complete
  • [x] Implementation: In progress. Shipping everything but 'natural types' and 'dictionary literals' in C# 12.

Many thanks to those who helped with this proposal. Esp. @jnm2!

Summary

Collection expressions introduce a new terse syntax, [e1, e2, e3, etc], to create common collection values. Inlining other collections into these values is possible using a spread operator .. like so: [e1, ..c2, e2, ..c2]. A [k1: v1, ..d1] form is also supported for creating dictionaries.

Several collection-like types can be created without requiring external BCL support. These types are:

Further support is present for collection-like types not covered under the above, such as ImmutableArray<T>, through a new API pattern that can be adopted directly on the type itself or through extension methods.

Motivation

  • Collection-like values are hugely present in programming, algorithms, and especially in the C#/.NET ecosystem. Nearly all programs will utilize these values to store data and send or receive data from other components. Currently, almost all C# programs must use many different and unfortunately verbose approaches to create instances of such values. Some approaches also have performance drawbacks. Here are some common examples:

    • Arrays, which require either new Type[] or new[] before the { ... } values.

    • Spans, which may use stackalloc and other cumbersome constructs.

    • Collection initializers, which require syntax like new List<T> (lacking inference of a possibly verbose T) prior to their values, and which can cause multiple reallocations of memory because they use N .Add invocations without supplying an initial capacity.

    • Immutable collections, which require syntax like ImmutableArray.Create(...) to initialize the values, and which can cause intermediary allocations and data copying. More efficient construction forms (like ImmutableArray.CreateBuilder) are unweildy and still produce unavoidable garbage.

  • Looking at the surrounding ecosystem, we also find examples everywhere of list creation being more convenient and pleasant to use. TypeScript, Dart, Swift, Elm, Python, and more opt for a succinct syntax for this purpose, with widespread usage, and to great effect. Cursory investigations have revealed no substantive problems arising in those ecosystems with having these literals built in.

  • C# has also added list patterns in C# 10. This pattern allows matching and deconstruction of list-like values using a clean and intuitive syntax. However, unlike almost all other pattern constructs, this matching/deconstruction syntax lacks the corresponding construction syntax.

  • Getting the best performance for constructing each collection type can be tricky. Simple solutions often waste both CPU and memory. Having a literal form allows for maximum flexibility from the compiler implementation to optimize the literal to produce at least as good a result as a user could provide, but with simple code. Very often the compiler will be able to do better, and the specification aims to allow the implementation large amounts of leeway in terms of implementation strategy to ensure this.

An inclusive solution is needed for C#. It should meet the vast majority of casse for customers in terms of the collection-like types and values they already have. It should also feel natural in the language and mirror the work done in pattern matching.

This leads to a natural conclusion that the syntax should be like [e1, e2, e3, e-etc] or [e1, ..c2, e2], which correspond to the pattern equivalents of [p1, p2, p3, p-etc] and [p1, ..p2, p3].

A form for dictionary-like collections is also supported where the elements of the literal are written as k: v like [k1: v1, ..d1]. A future pattern form that has a corresponding syntax (like x is [k1: var v1]) would be desirable.

Detailed design

The content of the proposal has moved to proposals/collection-expressions.md. Further updates to the proposal should be made there.

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-11-01.md#collection-literals https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-03-09.md#ambiguity-of--in-collection-expressions https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-28.md#collection-literals https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-04-03.md#collection-literals https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-04-26.md#collection-literals https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-05-03.md#collection-literal-natural-type https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-05-31.md#collection-literals https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-06-05.md#collection-literals https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-06-19.md#collection-literals https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-07-12.md#collection-literals https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-09-18.md#collection-expression-questions https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-09-20.md#collection-expressions https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-09-25.md#defining-well-defined-behavior-for-collection-expression-types https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-09-27.md#collection-expressions https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-10-02.md#collection-expressions https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-10-11.md#collection-expressions https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-11-15.md#nullability-analysis-of-collection-expressions https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-01-10.md https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-02-26.md#collection-expressions

Working group meetings

https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2022-10-06.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2022-10-14.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2022-10-21.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2023-04-05.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2023-04-28.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2023-05-26.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2023-06-12.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2023-06-26.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2023-08-03.md https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2023-08-10.md

CyrusNajmabadi avatar Oct 27 '21 16:10 CyrusNajmabadi

Overall looks good! Just a couple of points.

This facility thus prevents general use of such a marked method outside of known safe compiler scopes where the instance value being constructed cannot be observed until complete.

In the context of collection literals, the presence of these methods would allow types to trust that data passed into them cannot be mutated outside of them, and that they are being passed ownership of it. This would negate any need to copy data that would normally be assumed to be in an untrusted location.

This only works because at the moment it happens to be only a compiler can call init methods, if you don't use them yourself in one of your init properties.

However it doesn't seem like the sort of thing we'd want to rely on not changing in the future. For example we might allow calling init methods in the object initializer, at which point this would no longer be safe:

int[] ints = [1,2,3];
var immutArray = new{ Init(ints) };
int[0] = 5;

To resolve this, we could say that when evaluating a literal's spread_element expression, that there was an implicit target type equivalent to the target type of the literal itself. So, in the above, that would rewritten as:

It seems like this is not the most efficient solution - instead we would want to effectively inline the element, never materializing them into an array in the first place, and instead storing all the sub_elements on the stack.

Should we expand on collection initializers to look for the very common AddRange method? It could be used by the underlying constructed type to perform adding of spread elements potentially more efficiently. We might also want to look for things like .CopyTo as well. There may be drawbacks here as those methods might end up causing excess allocations/dispatches versus directly enumerating in the translated code.

We could only use those methods when the type of the spread_element exactly matches the parameter type of these methods, meaning we would not be causing virtual dispatch.

Can an unknown length literal create a collection type that needs a known length, like an array, span, or Init(array/span) collection? This would be harder to do efficiently, but it might be possible through clever use of pooled arrays and/or builders.

This is an extremely common use case - ToArray is the most commonly used Linq method. It would be unfortunate if the one syntax to rule them all required you to do: ((List<int>)[1, .. subspread, 2]).ToArray().

YairHalberstadt avatar Oct 28 '21 03:10 YairHalberstadt

@YairHalberstadt All good points. Thank you :)

CyrusNajmabadi avatar Oct 28 '21 03:10 CyrusNajmabadi

Linking to https://github.com/bartdesmet/csharplang/.../proposals/params-builders.md on the builder pattern.

alrz avatar Oct 28 '21 10:10 alrz

What if collection literals with an unknown length have a different natural type from those with a known length? The latter should likely be a List<T>, but the latter is an IEnumerable<T> with an unspeakable implementation? That is, [..c1, ..c2] where either collection has an unknown length is semantically equivalent to Enumerable.Concat(c1, c2).

orthoxerox avatar Oct 28 '21 20:10 orthoxerox

@orthoxerox Once you're able to target-type literals of unknown length to T[] and ImmutableArray<T>, which I very much hope is made possible and which is not solved by giving such literals a different natural type, then will it still be advantageous to give them a different natural type?

jnm2 avatar Oct 28 '21 21:10 jnm2

A downside to using an array would be if a natural type is added for collection literals and that natural type is not T[]. There would be a potentially surprising difference when refactoring between var x = [1, 2, 3]; and IEnumerable x = [1, 2, 3];.

Can you explain this to me, please? Wouldn't whatever type you choose as a natural type implement IEnumerable<T>? What would the difference be?

erikhermansson79 avatar Oct 29 '21 06:10 erikhermansson79

@erikhermansson79 Besides is type checks and pattern matching behaving differently, as well as GetType, and overload resolution if you use dynamic, there's also this for example: casting to IList and reading IsFixedSize will be observably different. If you place this in a public IEnumerable<int> property and databind using a UI framework, the difference in behavior could amount to a user-facing regression. And so on.

jnm2 avatar Oct 29 '21 09:10 jnm2

@orthoxerox Once you're able to target-type literals of unknown length to T[] and ImmutableArray<T>, which I very much hope is made possible and which is not solved by giving such literals a different natural type, then will it still be advantageous to give them a different natural type?

I agree, enumerating the collection literal just by changing the type of the variable it's assigned to sounds confusing.

orthoxerox avatar Oct 29 '21 13:10 orthoxerox

I think proposals benefit from having a section of various examples. Most people can't look at grammar rules and get a good feel for what the syntax will actually look like.

Separately, I will slightly object to this statement,

Looking at the surrounding ecosystem, we also find examples everywhere of list creation being more convenient and pleasant to use. TypeScript...

TS/JS just has a different syntax for initializing arrays. I don't know that its really much more convenient than new[] { ... }. It certainly doesn't have a generalized collection initialization syntax whatsoever. If you use a collection type other than Array in JS you are SOL.

MgSam avatar Oct 29 '21 16:10 MgSam

I think proposals benefit from having a section of various examples

It will look like: [a, b, .. c, d]

CyrusNajmabadi avatar Oct 29 '21 16:10 CyrusNajmabadi

Separately, I will slightly object to this statement,

The statement was that the presence of this literal form has not proven to itself be problematic for these languages. Not that the literal is sufficient for all usages in those languages.

In other words, these literals are not in "the bad parts". Nor are there contingents if users recommending people not use these.

CyrusNajmabadi avatar Oct 29 '21 16:10 CyrusNajmabadi

Here's an immediate example I can think of:

var processArguments =
[
    "pack",
    "-o", outputPath,
    ..(configuration is not null ? (["-c", configuration]) : []),
    "/bl:" + Path.Join(artifactsDir, @"logs\pack.log"),
];

jnm2 avatar Oct 29 '21 16:10 jnm2

I'm pretty firmly against approaches that either:

  1. Involve lazy evaluation. For that, use linq.
  2. Change the natural type of the expression based on the values within. I think that will just be too confusing and difficult to reason about.

CyrusNajmabadi avatar Oct 29 '21 16:10 CyrusNajmabadi

Why not use curly braces like we have them for arrays already?

So you could write:

List<int> myInts = {1, 2, 3, 4};

bernd5 avatar Oct 29 '21 16:10 bernd5

Why not use curly braces like we have them for arrays already?

This is referred to in the motivation section, but i'll give a little more information. Effectively {s have issues for us in being ambiguous in some situations if they mean lists or properties. So we've moved to [ with list patterns to have no ambiguity and to give a nice and clean syntax for list-like things. When we designed that we were aware that we likely wanted a 'literal correspondence' as well, which this proposal is.

I also cover the move from { to [ in the drawbacks section, albeit with the idea that this would allow us to have uniformity everywhere moving foward.

ALso note that { as an expression-form is potentially highly destabalizing for future work. For example, if we ever want a block expression then we couldnt' have that use { if { is also for lists. We sketched out and very much liked [...] for list patterns and these literals (both for looks, and for sidestepping a lot of issues), so we made the decision back in the pattern design to go this direction. We simply ordered it as "list pattern in c#10" and now hopefully "list literal in c#11".

CyrusNajmabadi avatar Oct 29 '21 17:10 CyrusNajmabadi

I think proposals benefit from having a section of various examples

It will look like: [a, b, .. c, d]

Not sure if you're being cheeky or what, but that is not a substitute for an example section. Some suggestions:

  • Before and afters of the various init syntaxes now vs how they'll look if using the proposed
  • Including the full context of where they might be used - declarations, method calls, etc. This is particularly relevant given C# already has the special array initialization syntax { } which only works for declarations
  • Since this is target-typing, presumably this doesn't work with var. Examples would help illustrate that
  • Does initializing collections of mixed types "just work" or does it require casting? For example, today I have to do new object[] { 1, "foo" } because the compiler cannot infer a best type.
  • Since spread operators currently don't work at all in existing collection initialization syntaxes, I think that needs to be segregated from other examples

MgSam avatar Oct 29 '21 18:10 MgSam

Since this is target-typing, presumably this doesn't work with var. Examples would help illustrate that

This is covered in the spec outline and the things to discuss. I don't want examples that imply things are not possible when no decision has been made on it.

CyrusNajmabadi avatar Oct 29 '21 18:10 CyrusNajmabadi

Does initializing collections of mixed types "just work"

This is covered in the spec as well. But I'll call out explicitly.

CyrusNajmabadi avatar Oct 29 '21 18:10 CyrusNajmabadi

Regarding immutability, here's an interesting thought:

Introduce "let" as a way of declaring an "immutable" variable.

let x=1;

Creates a readonly integer.

let x = [1,2,3];

Makes an immutable list

var x = [1,2,3];

Makes a regular list.

TonyValenti avatar Oct 30 '21 00:10 TonyValenti

Introduce "let" as a way of declaring an "immutable" variable.

The language has no concept of immutability (and i doubt it is likely to get one any time soon). I does have a concept of 'readonly' (and 'let' is already somethin we're considering there). So it likely wouldn't be a good fit as there would be inconsistency there.

Interesting idea though!

CyrusNajmabadi avatar Oct 30 '21 00:10 CyrusNajmabadi

It would have to work for let x = new List<MyMutableClass> { new MyMutableClass() }; too.

jnm2 avatar Oct 30 '21 00:10 jnm2

init methods would be cool for other things as well. There have been many issues, discussions etc. in the past that asked for a way to move common initialization code from a ctor into a (specifically marked) method which then may still initialize readonly (and now: init) members which right now is limited to the ctor itself (for readonly) and initialization contexts (for init).

BhaaLseN avatar Oct 30 '21 11:10 BhaaLseN

Regarding the implicit type ("natural type"), I would say the array has 2 pros and a con. One pro is that it's the only "built-in" type - meaning, it's a type which is part of the language. Second - it's the most efficient collection (correct me if I'm mistaken) - all other collection rely on arrays behind the scenes. The con - arrays already have a simple enough new[] { ... } syntax, so that leads me to List<T>. In my work, the most annoying boilerplate is the new List<T> { ... } - replacing that with [ ... ] would be a great improvement.

PS. Another reason to go with List<T> - I and probably many others follow the rule of using explicit types for keyword types, like int[], and var for other types, including List<int>. This means if I want an array, I can always do int[] a= [1,2,3]; - which aligns with how I manage explicit/implicit types, and for List<T>, I currently do var a = new List<int> { 1, 2, 3, }; which is greatly simplified to var a = [1, 2, 3];

PPS. On second thought, perhaps we can just say no (for now) to implicit type and be done with it.

TahirAhmadov avatar Nov 01 '21 14:11 TahirAhmadov

Other cons are that it heap allocates and that it is fixed length.

CyrusNajmabadi avatar Nov 01 '21 15:11 CyrusNajmabadi

Isn't it possible to make List<T> work with 1 allocation? Doesn't its new List<T>(123) ctor allocate the internal array to the specified size from the get go? Also, what's stopping us from using a special new ctor or static factory method (which can possibly be made internal) - surely that's not too much to add?

TahirAhmadov avatar Nov 01 '21 18:11 TahirAhmadov

The List<T> instance is one heap allocation, and the internal array is a second heap allocation. If the initial capacity is sufficient, there are no additional heap allocations or memory copies beyond that.

jnm2 avatar Nov 01 '21 19:11 jnm2

Oh my, that was such a brain fart - of course the List<> itself needs an allocation. Which immediately made me think of another idea - can a new type be created for this? Something called ValueList<T>. It'll be a struct, implement IList<T>, and have implicit conversion to List<T>. Internally, it can even perform the necessary analysis - if the # of items is low, keep the items on the stack, too; if it grows beyond a certain limit, say, 1024 bytes, move it to the heap (or go straight to heap if initial capacity is >=1024 bytes).

PS. Internally, this type can either 1) use if statements to determine whether it's operating in stack or heap mode, or 2) have delegates which are assigned either the stack or heap "handlers".

TahirAhmadov avatar Nov 01 '21 19:11 TahirAhmadov

  • can a new type be created for this?

You are certainly welcome to create a new type. That's a core part of this proposal that the proposal would work with any type that followed certain shapes.

Now, if the BCL would add a type like this? My guess would be no. Such a type would likely be highly problematic. For example, if you passed this ValueList to someone else, and they captured it, then they would only see portions of your mutations. For exaple, if you added items, they would not see it (since their length would not update). HOwever, if you mutated items prior to that point, they would see it (sinced they shared the same array) unless you (or them) also caused a reallocation (where you both would have distinct arrays). Also, if one added an element, and then the other added, the other would overwrit the first. etc. etc.

It would be enormously confusing.

CyrusNajmabadi avatar Nov 01 '21 19:11 CyrusNajmabadi

@TahirAhmadov Something very much in the spirit of what you just described is being considered: https://github.com/dotnet/runtime/pull/60519

jnm2 avatar Nov 01 '21 20:11 jnm2

Now, if the BCL would add a type like this? My guess would be no. Such a type would likely be highly problematic. For example, if you passed this ValueList to someone else, and they captured it, then they would only see portions of your mutations.

Yes, I originally was thinking only in the context of local usage of the collection. The problems you raised are very real. The only way to solve it would be to somehow mark this type as not being passable by value:

[ByValueUsageProhibited]
public struct ValueList<T>
{
  ...
}
ValueList<T> Prop { get; set; } // error: cannot pass this type by value
Action<ValueList<T>> action; // error
void Foo(ValueList<T> list) { } // error

ValueList<T> Bar() { ... } // no problem; Bar is a non-async, non-enumerable method which returns and 
// can therefore no longer touch the collection
void Bar2(in ValueList<T> list) { } // no problem; ref and out also OK
Func<ValueList<T>> fund; // no problem; we know that generic type is used as the return type

Now I know what you are thinking - this is hairy and adds scope. However, think about the benefits we're getting. If ValueList<> is added to the BCL, then it's possible to have a fully mutable collection which accepts all literal forms and needs zero heap allocations, as the natural/implicit type for collection literals (but it's not async-friendly). On second thought, implicit convertibility to List<> would be a mistake - it can hide nasty bugs similar to what you described; a special method, List<T> ToList(), would be needed to move the collection to the heap if and when necessary. Also, I just realized it cannot implement IList<T> for similar reasons - boxing will introduce confusion around the state of it; it can only implement IReadOnlyList<T> and it's base interfaces. PS. I typed the above and then realized that this is very similar to a ref struct. Can this be a ref struct? Would it solve our state-sharing problems?

@TahirAhmadov Something very much in the spirit of what you just described is being considered: dotnet/runtime#60519

That issue is for an array; I was thinking a mutable list. Now it's an open question if the technique they're using to shoehorn an array onto the stack can be made to work for a mutable list. Frankly, I doubt it; it would probably need a deeper framework change.

PPS. Having thought about this, here's a rough mock up: (more complete version)

public ref struct ValueList<T>
{
  public void Add(T item)
  {
    this.EnsureCapacity(this._size + 1);
    this.Set(this._size, item);
    ++this._size;
  }
  // Insert and Remove are implemented similarly - using Get/Set as abstractions to hide stack/heap mode
  public T this[int index] { get { if(index >= this._size) throw ...; return this.Get(index); } set { ... } }

  int _size;
  T[]? _array;
  T _item0, _item1, .... , _item127; // we'll need to figure out how many items to allow in stack mode

  T Get(int index) // also a similar Set is needed; these are low level access methods
  {
    if(this._array != null) return this._array[index];
    else { /* get (or set) the appropriate field; either a long switch or some "hack" to get by memory offset */ }
  }
  void EnsureCapacity(int capacity)
  {
    if(capacity > 128 && this._array == null)
    {
      this._array = new T[256]; 
      this._array[0] = this._item0;
      // and so on - or some unsafe memcpy type operation
    }
    else if(this._array != null && capacity > this._array.Length) { /* standard array resize - new, copy, assign */  }
  }
}

TahirAhmadov avatar Nov 01 '21 22:11 TahirAhmadov