csharplang [Proposal]: Auto-default structs

[Proposal]: Auto-default structs

Open RikkiGibson opened this issue 2 years ago • 27 comments

This proposal is raised as a possible mitigation for usability issues found in dotnet/csharplang#5552 and dotnet/csharplang#5635, as well as addressing #5563 (all fields must be definitely assigned, but field is not accessible within the constructor).

Speclet: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/auto-default-structs.md

Since C# 1.0, struct constructors have been required to definitely assign this as if it were an out parameter.

public struct S
{
    public int x, y;
    public S() // error: Fields 'S.x' and 'S.y' must be fully assigned before control is returned to the caller
    {
    }
}

This presents issues when setters are manually defined on semi-auto properties, since the compiler can't treat assignment of the property as equivalent to assignment of the backing field.

public struct S
{
    public int X { get => field; set => field = value; }
    public S() // error: struct fields aren't fully assigned. But caller can only assign 'this.field' by assigning 'this'.
    {
    }
}

We assume that introducing finer-grained restrictions for setters, such as a scheme where the setter doesn't take ref this but rather takes out field as a parameter, is going to be too niche and incomplete for some use cases.

One fundamental tension we are struggling with is that when struct properties have manually implemented setters, users often have to do some form of "repetition" of either repeatedly assigning or repeating their logic:

struct S
{
    private int _x;
    public int X
    {
        get => _x;
        set => _x = value >= 0 ? value : throw new ArgumentOutOfRangeException();
    }

    // Solution 1: assign some value in the constructor before "really" assigning through the property setter.
    public S(int x)
    {
        _x = default;
        X = x;
    }

    // Solution 2: assign the field once in the constructor, repeating the implementation of the setter.
    public S(int x)
    {
        _x = x >= 0 ? x : throw new ArgumentOutOfRangeException();
    }
}

Previous discussion

A small group has looked at this issue and considered a few possible solutions:

Require users to assign this = default when semi-auto properties have manually implemented setters. We agree this is the wrong solution since it blows away values set in field initializers.
Implicitly initialize all backing fields of auto/semi-auto properties.
- This solves the "semi-auto property setters" problem, and it squarely places explicitly declared fields under different rules: "don't implicitly initialize my fields, but do implicitly initialize my auto-properties."
Provide a way to assign the backing field of a semi-auto property and require users to assign it.
- This could be cumbersome compared to (2). An auto property is supposed to be "automatic", and perhaps that includes "automatic" initialization of the field. It could introduce confusion as to when the underlying field is being assigned by an assignment to the property, and when the property setter is being called.

We've also received feedback from users who want to, for example, include a few field initializers in structs without having to explicitly assign everything. We can solve this issue as well as the "semi-auto property with manually implemented setter" issue at the same time.

struct MagnitudeVector3d
{
    double X, Y, Z;
    double Magnitude = 1;
    public MagnitudeVector3d() // error: must assign 'X', 'Y', 'Z' before returning
    {
    }
}

Adjusting definite assignment

Instead of performing a definite assignment analysis to give errors for unassigned fields on this, we do it to determine which fields need to be initialized implicitly. Such initialization is inserted at the beginning of the constructor.

struct S
{
    int x, y;

    // Example 1
    public S()
    {
        // ok. Compiler inserts an assignment of `this = default`.
    }

    // Example 2
    public S()
    {
        // ok. Compiler inserts an assignment of `y = default`.
        x = 1;
    }

    // Example 3
    public S()
    {
        // valid since C# 1.0. Compiler inserts no implicit assignments.
        x = 1;
        y = 2;
    }

    // Example 4
    public S(bool b)
    {
        // ok. Compiler inserts assignment of `this = default`.
        if (b)
            x = 1;
        else
            y = 2;
    }

    // Example 5
    void M() { }
    public S(bool b)
    {
        // ok. Compiler inserts assignment of `y = default`.
        x = 1;
        if (b)
            M();

        y = 2;
    }
}

In examples (4) and (5), the resulting codegen sometimes has "double assignments" of fields. This is generally fine, but for users who are concerned with such double assignments, we can emit what used to be definite assignment error diagnostics as disabled-by-default warning diagnostics.

struct S
{
    int x;
    public S() // warning: 'S.x' is implicitly initialized to 'default'.
    {
    }
}

Users who set the severity of this diagnostic to "error" will opt in to the pre-C# 11 behavior. Such users are essentially "shut out" of semi-auto properties with manually implemented setters.

struct S
{
    public int X
    {
        get => field;
        set => field = field < value ? value : field;
    }

    public S() // error: backing field of 'S.X' is implicitly initialized to 'default'.
    {
        X = 1;
    }
}

At first glance, this feels like a "hole" in the feature, but it's actually the right thing to do. By enabling the diagnostic, the user is telling us that they don't want the compiler to implicitly initialize their fields in the constructor. There's no way to avoid the implicit initialization here, so the solution for them is to use a different way of initializing the field than a manually implemented setter, such as manually declaring the field and assigning it, or by including a field initializer.

Currently, the JIT does not eliminate dead stores through refs, which means that these implicit initializations do have a real cost. But that might be fixable. https://github.com/dotnet/runtime/issues/13727

It's worth noting that initializing individual fields instead of the entire instance is really just an optimization. The compiler should probably be free to implement whatever heuristic it wants, as long as it meets the invariant that fields which are not definitely assigned at all return points or before any non-field member access of 'this' are implicitly initialized.

For example, if a struct has 100 fields, and just one of them is explicitly initialized, it might make more sense to do an initobj on the entire thing, than to implicitly emit initobj for the 99 other fields. However, an implementation which implicitly emits initobj for the 99 other fields would still be valid.

Changes to language specification

We adjust the following section of the standard:

https://github.com/dotnet/csharpstandard/blob/draft-v6/standard/expressions.md#11712-this-access

If the constructor declaration has no constructor initializer, the this variable behaves exactly the same as an out parameter of the struct type. In particular, this means that the variable shall be definitely assigned in every execution path of the instance constructor.

We adjust this language to read:

If the constructor declaration has no constructor initializer, the this variable behaves similarly to an out parameter of the struct type, except that it is not an error when the definite assignment requirements (§9.4.1) are not met. Instead, we introduce the following behaviors:

When the this variable itself does not meet the requirements, then all unassigned instance variables within this at all points where requirements are violated are implicitly initialized to the default value (§9.3) in an initialization phase before any other code in the constructor runs.
When an instance variable v within this does not meet the requirements, or any instance variable at any level of nesting within v does not meet the requirements, then v is implicitly initialized to the default value in an initialization phase before any other code in the constructor runs.

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-02-14.md#definite-assignment-in-structs

Feb 09 '22 22:02 RikkiGibson

Aren't we going to have the same questions for classes and semi-auto props and fields WRT nullability analysis?

Feb 14 '22 20:02 TahirAhmadov

Yes, I wrote a bit about nullable analysis of semi-auto properties but decided to remove it from here and raise it separately, since it seems like it doesn't really depend on the containing type kind.

Feb 14 '22 20:02 RikkiGibson

Regarding the conclusion from LDM:

Proposal is accepted, contingent on following up with emeritus LDT members on original reasoning around struct definite assignment.

We have followed up. We didn't find any new reasons, beyond what we've already considered, that the 'out this' behavior was chosen originally for struct constructors.

Feb 18 '22 21:02 RikkiGibson

So, @RikkiGibson - is this the (one of the?) reason you guys pulled all SDKs < 6.0.200 ? That was a surprise to my entire company and their expectations of CI/CD. People who had locked SDK to 6.0.100 (to avoid this breaking change until planning could be done) had the rug pulled completely by surprise. Was this culling announced anywhere?

Feb 21 '22 20:02 oising

6.0.101 is still available for download at https://dotnet.microsoft.com/en-us/download/dotnet/6.0, as all previous versions of the sdk are.

Feb 21 '22 20:02 333fred

6.0.101 is still available for download at https://dotnet.microsoft.com/en-us/download/dotnet/6.0, as all previous versions of the sdk are.

Sure, and I could ask my buddy to give it to me on a USB key. I think you're missing the point: All SDKs < 6.0.200 were instantaneously removed from Microsoft's container registry, and from their hosted build agents. This broke any CI/CD pipeline that was relying on them. i.e.

    - task: UseDotNet@2
      inputs:
        packageType: 'sdk'
        version: '6.0.101'

... stopped working on Azure Devops CI/CD.

Feb 21 '22 21:02 oising

@oising that doesn’t have anything to do with this repository. You'll need to open a bug with them.

Feb 21 '22 22:02 333fred

@oising that doesn’t have anything to do with this repository. You'll need to open a bug with them.

I give up. No... wait, one more try:

Is this bug one of the reasons why SDKs < 6.0.200 were removed?

Feb 21 '22 22:02 oising

@oising

I'd try asking here: https://github.com/dotnet/sdk

Feb 21 '22 22:02 HaloFour

Is this bug one of the reasons why SDKs < 6.0.200 were removed?

No. I'm not aware of why such SDKs were removed.

Feb 21 '22 22:02 RikkiGibson

@oising That seems like a mistake, if it happened. It would break so many builds. It doesn't appear to be the case, however: https://github.com/actions/virtual-environments/blob/main/images/win/Windows2022-Readme.md#net-core-sdk (AzDO hosted agents and GitHub Actions) Maybe there was an infra problem?

Feb 21 '22 22:02 jnm2

Was? It's still going on:

https://hub.docker.com/_/microsoft-dotnet-sdk

@oising this repo has nothing to do with that part of the sdk. I would take it up with them thanks!

Feb 21 '22 23:02 CyrusNajmabadi

We recently explored the interaction between this feature and dotnet/roslyn#30194. An example of the interaction:

// ClassLibrary1.csproj
public struct S1
{
    private object x;
}

// ConsoleApp1.csproj (references ClassLibrary1.csproj)
public struct S2
{   
    public S1 s1;

    // assume all scenarios use the same compiler here.
    // C# 8: no diagnostics. implicit 's1 = default'.
    // C# 9: LangVersion warning diagnostic. implicit 's1 = default'.
    // C# 11: disabled-by-default warning diagnostic. implicit 's1 = default'.
    public S2(bool unused) { }
}

Our conclusions are:

The same emit is used in all language versions. This means that when the user updates their compiler version, the emit of existing code could change to add implicit default field assignments, where previously the user may have been using uninitialized memory. We think this is OK to do in all language versions because the behavior of using uninitialized memory is not defined, and this emit change simply moves the user from using an undefined behavior to a defined behavior when they update the compiler version.
If an old compiler gives a definite-assignment warning in some scenario, and the new compiler with preview LangVersion does not give that same warning, then the new compiler with old LangVersion should give a LangVersion warning.
Similarly: If an old compiler gives a definite-assignment error in some scenario, and the new compiler with preview LangVersion does not give that same error, then the new compiler with old LangVersion should give a LangVersion error.

Mar 17 '22 18:03 RikkiGibson

It looks like we are preserving the ability of a team to have different members on different compiler versions avoid breaking each other. So, I think this is a good place to land on this.

Mar 17 '22 18:03 KathleenDollard

This feature will be available in preview in VS 17.3.

Apr 04 '22 20:04 RikkiGibson

This is cool. I never really understood why structs had the explicit assignment requirement in the first place, though I may have read about it sometime in the past and just forgot.

Apr 13 '22 07:04 HopefulFrog

@HopefulFrog Because:

var x = new S();
x.F = 10;
x = new S();
Console.WriteLine(x.F); // will print 10 if F isn't initialized in struct constructor.

struct S
{
    public S() { }

    public int F;
}

Now, with this proposal implemented, the compiler will add F = 0 in the constructor for you (unless you already have F = something in the constructor in an always-reachable code path)

Apr 13 '22 07:04 Youssef1313

@HopefulFrog Because:
var x = new S();
x.F = 10;
x = new S();
Console.WriteLine(x.F); // will print 10 if F isn't initialized in struct constructor.

struct S
{
    public S() { }

    public int F;
}
Now, with this proposal implemented, the compiler will add F = 0 in the constructor for you (unless you already have F = something in the constructor in an always-reachable code path)

I'm not sure I understand your example. Are you saying that with structs, a new call somehow reuses the variable on the left side of the assignment? My assumption would have been that the second call of new S(); creates an entirely new instance that then overwrites the original one, meaning the value of F would no longer be 10, but would be default(int), if we're assuming that a value type that hasn't been explicitly assigned has its default value.

Apr 13 '22 08:04 HopefulFrog

My assumption would have been that the second call of new S(); creates an entirely new instance that then overwrites the original one, meaning the value of F would no longer be 10,

By default, that's not correct. The compiler now explicitly assign fields to their default values so that you see the behavior you want. Meaning the compiler adds some code so that F would no longer be 10.

To clarify more, for the example above, here is the IL of the struct constructor per SharpLab:

    .method public hidebysig specialname rtspecialname 
        instance void .ctor () cil managed 
    {
        // Method begins at RVA 0x2087
        // Code size 8 (0x8)
        .maxstack 8

        IL_0000: ldarg.0
        IL_0001: ldc.i4.0
        IL_0002: stfld int32 S::F
        IL_0007: ret
    } // end of method S::.ctor

Note that if the compiler didn't produce the instructions to assign 0 to F, then the code above will print 10.

Apr 13 '22 08:04 Youssef1313

Yeah, structs are weird. The constructor takes an 'out' ref to a variable as an argument and the compiler is free to use refs to uninitialized variables as arguments or to reuse variables that contained something else before as arguments.

Apr 14 '22 01:04 RikkiGibson

I had no idea. So, what happens in the constructor if it's not part of an assignment, like if your statement is just new S();?

Apr 14 '22 02:04 HopefulFrog

The compiler creates a temporary variable.

Apr 14 '22 06:04 RikkiGibson

What is the order of initialization of those fields? In fact, my question is about an odd situation that we have right now... if, for example, I create an ARGB struct using [FieldOffset], I can have something similar to a union, where I might have an intValue at offset 0 and the fields A, R, G, B from offsets 0 to 3. So, setting just the intValue should naturally set all the fields... but the compiler complains that not all fields were set. If the compiler "auto-set" them to 0, but after intValue was set, we will be introducing a bug there.

Apr 21 '22 05:04 paulozemek

@paulozemek The default assignment occurs before your intValue assignment:

2. When an instance variable v within this does not meet the requirements, or any instance variable at any level of nesting within v does not meet the requirements, then v is implicitly initialized to the default value in an initialization phase before any other code in the constructor runs.

(Emphasis mine)

As an aside:

but the compiler complains that not all fields were set.

In cases like this you can use Unsafe.SkipInit to skip initializing A/R/G/B since as a human you know they were all set when you assigned intValue.

Apr 21 '22 06:04 PathogenDavid

@paulozemek The default assignment occurs before your intValue assignment:

When an instance variable v within this does not meet the requirements, or any instance variable at any level of nesting within v does not meet the requirements, then v is implicitly initialized to the default value in an initialization phase before any other code in the constructor runs.

(Emphasis mine)

As an aside:

but the compiler complains that not all fields were set.

In cases like this you can use Unsafe.SkipInit to skip initializing A/R/G/B since as a human you know they were all set when you assigned intValue.

Thanks! It would be nice if the compiler identified that those bytes were already set and didn't even try to initialize them first... but there is a valid solution, so this is great.

Apr 21 '22 20:04 paulozemek

I don't really see much of a reason or benefit in this. I have been saved by the kind of error this seems to want to remove a considerable number of times in the past, when I actually forgot to assign a member of a struct. Why have this when the go-to solution in such a case has always been : this()? Or is there a particular reason why the following shouldn't work?

public struct S
{
    public int X { get => field; set => field = value; }
    public S() : this()
    {
    }
}

Sure, you might have a parameterless constructor, but then a syntax like : default could be introduced instead.

Mar 03 '23 17:03 IS4Code

This sounds great! And would eliminate one of the big annoying differences between defining a type as a struct versus a class.

Jan 26 '24 23:01 jamesford42

csharplang csharplang copied to clipboard

[Proposal]: Auto-default structs

Previous discussion

Adjusting definite assignment

Changes to language specification

Design meetings

csharplang
csharplang copied to clipboard