Improve performance for GetNames in .NET 8
In .NET 8, the performance of Enum-related methods has been considerably improved, albeit not as fast as your excellent extension.
Recently, Nick Chapsas promoted your extension and ran benchmarks on it in .NET 8 and interestingly, the method GetNames of the source generator is slower than the native method.
See https://youtu.be/UBY4Y6AykdM?si=VzbUusG5YOu1Ke2X&t=418
Could we work out a solution to improve the performance so it's on par or faster than the native equivalent?
Thanks @silkfire! Huh, that's interesting, I had seen things about the enum perf improvements in .NET 8 and so I had already run the tests in the repository myself, and found that the extension version was still faster...
[MemoryDiagnoser]
public class GetNamesBenchmark
{
#if NETFRAMEWORK
[Benchmark(Baseline = true)]
[MethodImpl(MethodImplOptions.NoInlining)]
public string[] EnumGetNames()
{
return Enum.GetNames(typeof(TestEnum));
}
#else
[Benchmark(Baseline = true)]
[MethodImpl(MethodImplOptions.NoInlining)]
public string[] EnumGetNames()
{
return Enum.GetNames<TestEnum>();
}
#endif
[Benchmark]
[MethodImpl(MethodImplOptions.NoInlining)]
public string[] ExtensionsGetNames()
{
return TestEnumExtensions.GetNames();
}
}
Which gave these results:
BenchmarkDotNet v0.13.9+228a464e8be6c580ad9408e98f18813f6407fb5a, Windows 10 (10.0.19045.3448/22H2/2022Update)
Intel Core i7-7500U CPU 2.70GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores
.NET SDK 8.0.100-rc.1.23463.5
[Host] : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2
Job-VRBYAA : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2
Runtime=.NET 8.0 Toolchain=net8.0
| Type | Method | Mean | Error | StdDev | Median | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|---|
| GetNamesBenchmark | EnumGetNames | 14.3923 ns | 0.2631 ns | 0.2054 ns | 14.3414 ns | 0.729 | 0.02 | 0.0229 | 48 B | 1.00 |
| GetNamesBenchmark | ExtensionsGetNames | 7.0327 ns | 0.1008 ns | 0.0842 ns | 7.0475 ns | 0.357 | 0.01 | 0.0229 | 48 B | 1.00 |
i.e. native version was 14ns, and 7ns for the extension. So directly contradicts Nick's findings 😅
Perhaps it's related to some SIMD work they're doing now? I ran these on a pretty old laptop, which only has 2 cores, if Nick's using something much beefier, maybe it's different.
Alternatively, could be related to the enum itself. My test enum only had three values:
[EnumExtensions]
public enum TestEnum
{
First = 0,
[Display(Name = "2nd")]
Second = 1,
Third = 2,
}
I can try testing with the same one as Nick (Day has 7 values) to see if there's any difference. Any other ideas @Elfocrash? 🤔
Very interesting indeed! I think it's likely that Mr Chapsas' machine is a bit more modern than the one you tested with, as I'm getting similar results (using the TestEnum).
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22621
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=8.0.100-rc.2.23502.2
[Host] : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT
DefaultJob : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Allocated |
|---|---|---|---|---|---|---|---|
| EnumGetNames | 8.459 ns | 0.0715 ns | 0.0634 ns | 1.00 | 0.00 | 0.0029 | 48 B |
| ExtensionsGetNames | 12.933 ns | 0.2027 ns | 0.1896 ns | 1.53 | 0.02 | 0.0029 | 48 B |
I checked the source code of the native method and it seems it's written using super-optimized code that leverages internal P/Invoke calls to QCall. Something similar is used at least in .NET 6 too. Perhaps that's the only situation where the native method is hard to compete with, but I'm no expert at these micro optimizations so it's hard for me to tell.
That is interesting. I just ran a test with 7 enum values to make sure that wasn't part of it, and I'm still getting the same overall results
| Method | Mean | Error | StdDev | Ratio | Gen0 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|
| EnumGetNames | 19.249 ns | 0.3034 ns | 0.2534 ns | 1.00 | 0.0383 | 80 B | 1.00 |
| ExtensionsGetNames | 9.445 ns | 0.1123 ns | 0.0995 ns | 0.49 | 0.0383 | 80 B | 1.00 |
Will have to dig in further 🤔
Could it be as you mentioned in your previous reply that more modern processors leverage hardware intrinsics which results in faster execution of the native method?
Yeah, I'm assuming that's it (will confirm I also repro on my work machine instead of my old personal laptop). I'm guessing the magic happened in this PR 👀 https://github.com/dotnet/runtime/pull/78580