Perf: Use `Span.CopyTo<Byte>()` on .NET 7 and above.
This changes UnsafeMemory to use the safe equivalents in .NET 7 and above as they are more performant than the equivalent code and can take advantage of vectorization on hardware that supports it
This optimization exists on at least .NET 7.
Thanks for this. But we don't need the #if I think. We only target .NET 8 at a minimum at this point, and .NET Standard 2.0 / .NET Framework, which also has these APIs through our reference to System.Memory.
Can you just replace the old code with the new code?
And then we should definitely get @neuecc's review (and maybe @pCYSl5EDgo) because they've done a lot of micro-optimization work in this library.
Alright. It looks like .NET 7 and below doesn't have the optimization anyway potentially (it replaces with a call to Buffer.Memmove, indirection to GetSpan removed for sake of example).
Output on .NET 8
A.UnsafeMemory64:WriteRaw31_New(System.Span`1[ubyte],System.ReadOnlySpan`1[ubyte]) (FullOpts):
push rbp
vzeroupper
mov rbp, rsp
cmp ecx, 31
jb SHORT G_M17461_IG05
cmp esi, 31
jb SHORT G_M17461_IG06
vmovdqu xmm0, xmmword ptr [rdx]
vmovdqu xmm1, xmmword ptr [rdx+0x0F]
vmovdqu xmmword ptr [rdi], xmm0
vmovdqu xmmword ptr [rdi+0x0F], xmm1
pop rbp
ret
G_M17461_IG05: ;; offset=0x0025
call [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
int3
G_M17461_IG06: ;; offset=0x002C
call [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
int3
vs. .NET 7
A.UnsafeMemory64:WriteRaw31_New(System.Span`1[ubyte],System.ReadOnlySpan`1[ubyte]):
push rbp
mov rbp, rsp
cmp ecx, 31
jb SHORT G_M17461_IG04
cmp esi, 31
jb SHORT G_M17461_IG05
mov rsi, rdx
mov edx, 31
call [System.Buffer:Memmove(byref,byref,ulong)]
nop
pop rbp
ret
G_M17461_IG04:
call [System.ThrowHelper:ThrowArgumentOutOfRangeException()]
int3
G_M17461_IG05:
call [System.ThrowHelper:ThrowArgumentException_DestinationTooShort()]
int3
``` which still may be faster but hard to tell for sure without testing more thoroughly.
Thanks. Two more thoughts.
So long as you're looking at native assembly emitted, can you look at .NET Framework? That's important too.
If this really is going to be the way to go, I guess we can get rid of all the many methods that you're changing too, as they were named and implemented specially for the length of the span, and that evidently isn't necessary any more since their implementations are now identical.
But feel free to wait for @neuecc's take before doing more work, since he may know something about why this isn't doable in the first place. Unity is another target runtime and it may be important there that the code remains as-is, for example.
This seems like a remnant from an era before Span existed.
I think the original original received byte[].
In other words, I think the changes are fine.
In the end, it goes to SpanHelpers.memmove? https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.ByteMemOps.cs,36 The original goal is that if we know the length in advance, we can avoid branching. The code size also becomes smaller, which should allow for inlining. There's no room for vectorization when dealing with small data. Instead of vectorizing, we are unrolling with a fixed approach However, since memmove is intrinsic, would its behavior in .NET differ from the code's appearance? It's difficult to immediately determine whether it's better than a handwritten implementation (although not needing "fixed" is good).
Minimal benchmarking shows that Span is better than the original. However, manual fixed value optimisation using Unsafe.WriteUnaligned was even better.