NetEscapades.EnumGenerators Improve performance for GetNames in .NET 8

In .NET 8, the performance of Enum-related methods has been considerably improved, albeit not as fast as your excellent extension.

Recently, Nick Chapsas promoted your extension and ran benchmarks on it in .NET 8 and interestingly, the method GetNames of the source generator is slower than the native method.

See https://youtu.be/UBY4Y6AykdM?si=VzbUusG5YOu1Ke2X&t=418

Could we work out a solution to improve the performance so it's on par or faster than the native equivalent?

Oct 20 '23 14:10 silkfire

Thanks @silkfire! Huh, that's interesting, I had seen things about the enum perf improvements in .NET 8 and so I had already run the tests in the repository myself, and found that the extension version was still faster...

This was the test I used

[MemoryDiagnoser]
public class GetNamesBenchmark
{
#if NETFRAMEWORK
    [Benchmark(Baseline = true)]
    [MethodImpl(MethodImplOptions.NoInlining)]
    public string[] EnumGetNames()
    {
        return Enum.GetNames(typeof(TestEnum));
    }
#else
    [Benchmark(Baseline = true)]
    [MethodImpl(MethodImplOptions.NoInlining)]
    public string[] EnumGetNames()
    {
        return Enum.GetNames<TestEnum>();
    }
#endif

    [Benchmark]
    [MethodImpl(MethodImplOptions.NoInlining)]
    public string[] ExtensionsGetNames()
    {
        return TestEnumExtensions.GetNames();
    }
}

Which gave these results:

BenchmarkDotNet v0.13.9+228a464e8be6c580ad9408e98f18813f6407fb5a, Windows 10 (10.0.19045.3448/22H2/2022Update)
Intel Core i7-7500U CPU 2.70GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores
.NET SDK 8.0.100-rc.1.23463.5
  [Host]     : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2
  Job-VRBYAA : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2

Runtime=.NET 8.0  Toolchain=net8.0

Type	Method	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen0	Allocated	Alloc Ratio
GetNamesBenchmark	EnumGetNames	14.3923 ns	0.2631 ns	0.2054 ns	14.3414 ns	0.729	0.02	0.0229	48 B	1.00
GetNamesBenchmark	ExtensionsGetNames	7.0327 ns	0.1008 ns	0.0842 ns	7.0475 ns	0.357	0.01	0.0229	48 B	1.00

i.e. native version was 14ns, and 7ns for the extension. So directly contradicts Nick's findings 😅

Perhaps it's related to some SIMD work they're doing now? I ran these on a pretty old laptop, which only has 2 cores, if Nick's using something much beefier, maybe it's different.

Alternatively, could be related to the enum itself. My test enum only had three values:

[EnumExtensions]
public enum TestEnum
{
    First = 0,

    [Display(Name = "2nd")]
    Second = 1,
    Third = 2,
}

I can try testing with the same one as Nick (Day has 7 values) to see if there's any difference. Any other ideas @Elfocrash? 🤔

Oct 20 '23 15:10 andrewlock

Very interesting indeed! I think it's likely that Mr Chapsas' machine is a bit more modern than the one you tested with, as I'm getting similar results (using the TestEnum).

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22621
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=8.0.100-rc.2.23502.2
  [Host]     : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT
  DefaultJob : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Allocated
EnumGetNames	8.459 ns	0.0715 ns	0.0634 ns	1.00	0.00	0.0029	48 B
ExtensionsGetNames	12.933 ns	0.2027 ns	0.1896 ns	1.53	0.02	0.0029	48 B

I checked the source code of the native method and it seems it's written using super-optimized code that leverages internal P/Invoke calls to QCall. Something similar is used at least in .NET 6 too. Perhaps that's the only situation where the native method is hard to compete with, but I'm no expert at these micro optimizations so it's hard for me to tell.

Oct 20 '23 17:10 silkfire

That is interesting. I just ran a test with 7 enum values to make sure that wasn't part of it, and I'm still getting the same overall results

Method	Mean	Error	StdDev	Ratio	Gen0	Allocated	Alloc Ratio
EnumGetNames	19.249 ns	0.3034 ns	0.2534 ns	1.00	0.0383	80 B	1.00
ExtensionsGetNames	9.445 ns	0.1123 ns	0.0995 ns	0.49	0.0383	80 B	1.00

Will have to dig in further 🤔

Oct 20 '23 17:10 andrewlock

Could it be as you mentioned in your previous reply that more modern processors leverage hardware intrinsics which results in faster execution of the native method?

Oct 20 '23 17:10 silkfire

Yeah, I'm assuming that's it (will confirm I also repro on my work machine instead of my old personal laptop). I'm guessing the magic happened in this PR 👀 https://github.com/dotnet/runtime/pull/78580

Oct 20 '23 17:10 andrewlock