roslyn icon indicating copy to clipboard operation
roslyn copied to clipboard

CodeGen for defaulted collection expression elements (Span via InlineArray)

Open bernd5 opened this issue 1 year ago • 4 comments

Version Used:

Steps to Reproduce:

Compile the following code:


using System;

Span<int> i = [0, 1, 0];

Expected Behavior: I would assume the assignment of elements with a default value could be omitted.

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
[module: RefSafetyRules(11)]

[CompilerGenerated]
internal class Program
{
    private static void <Main>$(string[] args)
    {
        <>y__InlineArray3<int> buffer = default(<>y__InlineArray3<int>);
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray3<int>, int>(ref buffer, 1) = 1;
        <PrivateImplementationDetails>.InlineArrayAsSpan<<>y__InlineArray3<int>, int>(ref buffer, 3);
    }
}

[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(3)]
internal struct <>y__InlineArray3<T>
{
    [CompilerGenerated]
    private T _element0;
}

Actual Behavior:

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
[module: RefSafetyRules(11)]

[CompilerGenerated]
internal class Program
{
    private static void <Main>$(string[] args)
    {
        <>y__InlineArray3<int> buffer = default(<>y__InlineArray3<int>);
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray3<int>, int>(ref buffer, 0) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray3<int>, int>(ref buffer, 1) = 1;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray3<int>, int>(ref buffer, 2) = 0;
        <PrivateImplementationDetails>.InlineArrayAsSpan<<>y__InlineArray3<int>, int>(ref buffer, 3);
    }
}

[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(3)]
internal struct <>y__InlineArray3<T>
{
    [CompilerGenerated]
    private T _element0;
}

bernd5 avatar Jun 27 '24 13:06 bernd5

I would assume the assignment of elements with a default value could be omitted.

Why would you assume that? The user has explicitly asked the compiler to assign a value here, why should the compiler omit that?

Consider a case where the user is say getting an array back from a pool and wants ensure parts of it are initialized. In that case the default value certainly cannot be omitted.

jaredpar avatar Jun 27 '24 23:06 jaredpar

In general you are absolutely right. I mean the spefific Span lowering via inline array. Inline Array structs are already default initialized after construction. So these assignments are noops which can be removed / optimized away.

bernd5 avatar Jun 28 '24 05:06 bernd5

What if the user has disabled locals init behavior and are using the defaults to force initialization?

jaredpar avatar Jun 28 '24 18:06 jaredpar

Right, then it can't be optimized out. The result should be the same.

bernd5 avatar Jun 28 '24 18:06 bernd5

Right, then it can't be optimized out. The result should be the same.

@jaredpar could you explain the situation when it can't be optimized out? The inline array is constructed by the compiler - not by the user, and the compiler emits the initobj instruction which default initializes the struct.

In my eyes the assignment of constant default values can be omitted always for inline array to span construction.

This code can be

using System;

foreach (var item in (Span<int>)[0, 1, 0]){
  Console.WriteLine(item);
}

lowered to:

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

internal class Program
{
    [SkipLocalsInit]
    private static void Main(string[] args)
    {
        y__InlineArray3<int> buffer = default(y__InlineArray3<int>);
        //not needed
        //PrivateImplementationDetails.InlineArrayElementRef<y__InlineArray3<int>, int>(ref buffer, 0) = 0;
        PrivateImplementationDetails.InlineArrayElementRef<y__InlineArray3<int>, int>(ref buffer, 1) = 1;
        //not needed
        //PrivateImplementationDetails.InlineArrayElementRef<y__InlineArray3<int>, int>(ref buffer, 2) = 0;
        Span<int> s = PrivateImplementationDetails.InlineArrayAsSpan<y__InlineArray3<int>, int>(ref buffer, 3);
        foreach (var item in s){
            Console.WriteLine(item);
        }
    }
}

[CompilerGenerated]
internal sealed class PrivateImplementationDetails
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(3)]
internal struct y__InlineArray3<T>
{
    [CompilerGenerated]
    private T _element0;
}

bernd5 avatar Jul 02 '24 07:07 bernd5

could you explain the situation when it can't be optimized out?

I've listed a few already in the issue. Which one is unclear?

In my eyes the assignment of constant default values can be omitted always for inline array to span construction.

Optimizations need to be treated like features. The burden is to demonstrate that it is safe, not intuit that it's safe. For example explicitly listing the scenarios in which the optimization will occur and why specifically it's legal in that scenario.

For example in the Span scenario you have to demonstrate that the memory which is not written will be guaranteed to have the default value already.

Also: optimizations are not free. They are a significant source of regressions. That means there is also the burden of demonstrating this optimization is valuable / worth the risk trade off. I've seen no evidence of that in this issue.

jaredpar avatar Jul 03 '24 15:07 jaredpar

What if the user has disabled locals init behavior and are using the defaults to force initialization?

That doesn't matter because the inline-array-struct is explicitly defaulted. If it wouldn't I agree.

Why would you assume that? The user has explicitly asked the compiler to assign a value here, why should the compiler omit that? Consider a case where the user is say getting an array back from a pool and wants ensure parts of it are initialized. In that case the default value certainly cannot be omitted.

That is a very different case. I talk only about span-collection-expression lowering via inline-array, nothing else.

I added a draft-PR already which seems to fail only for some code gen builder tests, where a temporary inline array is created for the collection builder.

For heap / regular arrays this is already implemented, see: EmitArrayInitializer. I think we should do that for inline array, too. Especially because they are used only because of performance.

bernd5 avatar Jul 03 '24 18:07 bernd5

I feel like you haven't fully responded to my last comment

  1. What are the very specific cases this would apply to and why is it demonstrably safe in all those cases?
  2. Why should we do this optimization? Every change, particularly optimizations, carry risk. For optimizations we need real data to motivate us to take these changes. I'm very skeptical on this particular point.

Lacking these two items this issue and accompanying PRs cannot move forward.

jaredpar avatar Jul 03 '24 18:07 jaredpar

What are the very specific cases this would apply to and why is it demonstrably safe in all those cases?

I mean this use case:

using System;
using System.IO;

Stream someStream = null;
Span<byte> smallBuffer = [0,0,0,0,0,0,0,0,0,0];
someStream.Read(smallBuffer);

As a user I expect that an inline array is constructed without any assigments. But as you see, today the compiler emits:

using System;
using System.Diagnostics;
using System.IO;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
[module: RefSafetyRules(11)]

[CompilerGenerated]
internal class Program
{
    private static void <Main>$(string[] args)
    {
        <>y__InlineArray9<byte> buffer = default(<>y__InlineArray9<byte>);
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 0) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 1) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 2) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 3) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 4) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 5) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 6) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 7) = 0;
        <PrivateImplementationDetails>.InlineArrayElementRef<<>y__InlineArray9<byte>, byte>(ref buffer, 8) = 0;
        Span<byte> buffer2 = <PrivateImplementationDetails>.InlineArrayAsSpan<<>y__InlineArray9<byte>, byte>(ref buffer, 9);
        ((Stream)null).Read(buffer2);
    }
}

[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(9)]
internal struct <>y__InlineArray9<T>
{
    [CompilerGenerated]
    private T _element0;
}

And I want:

using System;
using System.Diagnostics;
using System.IO;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
[module: RefSafetyRules(11)]

[CompilerGenerated]
internal class Program
{
    private static void <Main>$(string[] args)
    {
        <>y__InlineArray9<byte> buffer = default(<>y__InlineArray9<byte>);
        Span<byte> buffer2 = <PrivateImplementationDetails>.InlineArrayAsSpan<<>y__InlineArray9<byte>, byte>(ref buffer, 9);
        ((Stream)null).Read(buffer2);
    }
}

[CompilerGenerated]
internal sealed class <PrivateImplementationDetails>
{
    internal static Span<TElement> InlineArrayAsSpan<TBuffer, TElement>(ref TBuffer buffer, int length)
    {
        return MemoryMarshal.CreateSpan(ref Unsafe.As<TBuffer, TElement>(ref buffer), length);
    }

    internal static ref TElement InlineArrayElementRef<TBuffer, TElement>(ref TBuffer buffer, int index)
    {
        return ref Unsafe.Add(ref Unsafe.As<TBuffer, TElement>(ref buffer), index);
    }
}

[StructLayout(LayoutKind.Auto)]
[InlineArray(9)]
internal struct <>y__InlineArray9<T>
{
    [CompilerGenerated]
    private T _element0;
}

Why should we do this optimization? Every change, particularly optimizations, carry risk. For optimizations we need real data to motivate us to take these changes. I'm very skeptical on this particular point.

In my eyes it is obvious that those assignments are unnecessary. The inline array is already defaulted (though in this case I don't care about initialization at all).

bernd5 avatar Jul 03 '24 19:07 bernd5

why is it demonstrably safe in all those cases

Because we do:

default(<>y__InlineArray9<byte>)

bernd5 avatar Jul 03 '24 19:07 bernd5

If I would use stackalloc the generated code looks much better, but it coulde issues with inlining or in loops...

using System;
using System.IO;

Stream someStream = null;
Span<byte> smallBuffer = stackalloc byte[9];
someStream.Read(smallBuffer);

using an inline array under the hood seems to be the better solution because it is safe.

bernd5 avatar Jul 03 '24 19:07 bernd5

In my eyes it is obvious that those assignments are unnecessary

That is not real data. Data is profiles showing that the optimization actually improves performance. Depending on intuition like this can, and does, lead to real world regressions. Further it really helps to see data around impact. For example how often does this pattern happen in the real world? An optimization might be demonstrably better but if no real world code hits it then what is the point in doing it?

I mean this use case:

There needs to be more specifics than a single case. It needs to be much more detailed about when an optimization happens. For example this optimization cannot happen when init locals is disable.

Further it needs to touch on what other implications this has. Consider for example #73269 where we want to be smarter about temp re-use in the compiler (leads to stack size savings in real world scenarios). This optimization directly conflicts with that one because this optimization is brittle in the face of temporary re-use as it would allow uninitialized data to leak through. Items like that need to be discussed.

jaredpar avatar Jul 08 '24 13:07 jaredpar

I actually did a benchmark for this specific use case and observed that such unnecessary assignments to defaulted struct-fields are optimized away by the runtime (for Release builds) as long as the struct construction is in the same method / stack-frame (without inlining - which is the case here).

See for example:

using System;

var test = new Test();
test.A = null;
test.B = default;
test.B = default;
Console.WriteLine(test.A);
Console.WriteLine(test.B);

struct Test{
  public string A;   
  public int B;
  public int C;
}

This is compiled to:

Program..ctor()
    L0000: ret

Program.<Main>$(System.String[])
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: xor ecx, ecx
    L0005: call dword ptr [0x13756f58]
    L000b: xor ecx, ecx
    L000d: call dword ptr [0x13756ee0]
    L0013: pop ebp
    L0014: ret

Which is quite perfect!

So the real benefit woule be (only) less work for the runtime optimization and binary-size reduction.

Maybe strange is that such an optimization is not implemented for locally constructed arrays or classes.

E.g.:

using System;

var test = new Test();
test.A = null;
test.B = default;
test.B = default;
Console.WriteLine(test.A);
Console.WriteLine(test.B);

class Test{
  public string A;   
  public int B;
  public int C;
}

is compiled to:

; Core CLR 8.0.624.26715 on x86

Program..ctor()
    L0000: ret

Program.<Main>$(System.String[])
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push esi
    L0004: mov ecx, 0x277aca2c
    L0009: call 0x06ec300c
    L000e: mov esi, eax
    L0010: xor ecx, ecx
    L0012: mov [esi+4], ecx
    L0015: mov [esi+8], ecx
    L0018: mov [esi+8], ecx
    L001b: mov ecx, [esi+4]
    L001e: call dword ptr [0x13756f58]
    L0024: mov ecx, [esi+8]
    L0027: call dword ptr [0x13756ee0]
    L002d: pop esi
    L002e: pop ebp
    L002f: ret

Test..ctor()
    L0000: ret

I guess the runtime assumes always other references to the heap storage. And because such optimizations are not handled in the runtime I think the IL CodeGen was adjusted (for arrays).

So I would understand if you say - well it is optimized enough by the runtime.

bernd5 avatar Jul 08 '24 14:07 bernd5

The benchmark code I used was:

using System;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;

var config = DefaultConfig.Instance;
var summary = BenchmarkRunner.Run <TestInlineArray>(config, args);

[DisassemblyDiagnoser]
//[RPlotExporter]
[SimpleJob(runtimeMoniker: BenchmarkDotNet.Jobs.RuntimeMoniker.Net80)]
//[SimpleJob(runtimeMoniker: BenchmarkDotNet.Jobs.RuntimeMoniker.Mono)]
public class TestInlineArray
{
    [Benchmark]
    public void CreateEmptySpanWithCollectionExpression()
    {
        Span<byte> bytes = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0,
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0];

        UseSpan(bytes);
    }

    [Benchmark]
    public void CreateEmptySpanWithInlineArrayLikeTheCompilerDoesDirectAndWithDisabledGitOptimization()
    {
        var array = CreateInlineArraySoTheJitDoesNotSeeItIsDefaulted();
        array[0] = 0;
        array[1] = 0;
        array[2] = 0;
        array[3] = 0;
        array[4] = 0;
        array[5] = 0;
        array[6] = 0;
        array[7] = 0;
        array[8] = 0;
        array[9] = 0;

        array[10] = 0;
        array[11] = 0;
        array[12] = 0;
        array[13] = 0;
        array[14] = 0;
        array[15] = 0;
        array[16] = 0;
        array[17] = 0;
        array[18] = 0;
        array[19] = 0;

        array[20] = 0;
        array[21] = 0;
        array[22] = 0;
        array[23] = 0;
        array[24] = 0;
        array[25] = 0;
        array[26] = 0;
        array[27] = 0;
        array[28] = 0;
        array[29] = 0;

        UseSpan(array);

        [MethodImpl(MethodImplOptions.NoInlining)]
        InlineArrayX CreateInlineArraySoTheJitDoesNotSeeItIsDefaulted()
        {
            return new();
        }
    }

    [Benchmark]
    public void CreateEmptySpanWithInlineArrayDirect()
    {
        var array = new InlineArrayX();
        Span<byte> bytes = array;
        UseSpan(bytes);
    }

    [Benchmark]
    public void CreateEmptySpanWithStackAlloc()
    {
        Span<byte> bytes = stackalloc byte[30];
        UseSpan(bytes);
    }

    private void UseSpan(Span<byte> span)
    {
        foreach (var s in span)
        {
            if (s != 0)
            {
                throw new Exception();
            }
        }
    }

    [InlineArray(30)]
    private struct InlineArrayX
    {
        public byte Data;
    }
}

CreateEmptySpanWithInlineArrayLikeTheCompilerDoesDirectAndWithDisabledGitOptimization shows what happens with disabled runtime optimization.

The result was:

| Method                                                                                | Mean     | Error    | StdDev   | Code Size |
|-------------------------------------------------------------------------------------- |---------:|---------:|---------:|----------:|
| CreateEmptySpanWithCollectionExpression                                               | 16.01 ns | 0.355 ns | 0.332 ns |     158 B |
| CreateEmptySpanWithInlineArrayLikeTheCompilerDoesDirectAndWithDisabledGitOptimization | 24.14 ns | 0.313 ns | 0.261 ns |     340 B |
| CreateEmptySpanWithInlineArrayDirect                                                  | 15.31 ns | 0.163 ns | 0.136 ns |     157 B |
| CreateEmptySpanWithStackAlloc                                                         | 16.22 ns | 0.168 ns | 0.157 ns |     188 B |

bernd5 avatar Jul 08 '24 14:07 bernd5

Thansk fro the benchmark. Closing for now because it seems like the runtime is handling these cases fairly good today.

Can re-open if we find this is not the case or find real world cases where this is a problem.

jaredpar avatar Jul 11 '24 15:07 jaredpar