runtime
runtime copied to clipboard
Implement DivRem intrinsic for X86
Closes #27292.
Some part of code was took from #37928 and #64864. I haven't fully understood all the concepts indeed. Needs some JIT expert to explain them.
Not implementing for Mono because I can't understand the code, and this one has special register constraints.
Note regarding the new-api-needs-documentation
label:
This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.
Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.
Issue Details
Closes #27292.
Some part of code was took from #37928 and #64864. I haven't fully understood all the concepts indeed. Needs some JIT expert to explain them.
Not implementing for Mono because I can't understand the code, and this one has special register constraints.
Author: | huoyaoyuan |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics See info in area-owners.md if you want to be subscribed.
Issue Details
Closes #27292.
Some part of code was took from #37928 and #64864. I haven't fully understood all the concepts indeed. Needs some JIT expert to explain them.
Not implementing for Mono because I can't understand the code, and this one has special register constraints.
Author: | huoyaoyuan |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | - |
Code gen for sample method:
public static ulong XL(ulong a, ulong b)
{
(ulong q, ulong r) = Math.DivRem(a, b);
return q + r;
}
G_M52626_IG01:
mov qword ptr [rsp+10H], rdx
;; bbWeight=1 PerfScore 1.00
G_M52626_IG02:
xor edx, edx
mov rax, rcx
div rdx:rax, qword ptr [rsp+10H]
add rax, rdx
;; bbWeight=1 PerfScore 61.75
G_M52626_IG03:
ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code: 19
In one commit it uses R8 instead of stack, but fails with register confliction in optimized code.
It can be worse if RDX
is used in calling convention. For the following method:
public long DivRemInt64(long a, long b)
{
(long q, long r) = Math.DivRem(a, b);
return q + r;
}
It saves a copy to stack:
mov [rsp+10],rdx
sar rdx,3F
mov rax,[rsp+10]
idiv r8
add rax,rdx
ret
; Total bytes of code 21
And causes micro benchmark regresses. But real world sample can differ by register assignation.
Do I need to implement for Mono in this PR or create an issue to track it? This is an already implemented ISA and used in corelib. Lack of implementation would cause existing tests to fail. /cc @vargaz
Maybe put the usage inside an #ifdef CORECLR for now and disable the relevant tests for mono ?
I can't prevent LSRA from allocating op3 into EAX, and it causes random assertion failures, no matter using DelayFree
or not.
If I've understood correctly, both EAX and EDX are exposed by BuildDef, so there's no need to kill them right?
@dotnet/jit-contrib I need some help here.
Edit: I misunderstood RegNum. In case the operand comes from method returned RAX, it's correctly spilled into a temp, but the RegNum of the operand keeps RAX. Realized it after viewing JIT dump in deep.
All the XARCH tests passed. ARM64 failure looks like a timeout. Marking as ready.
The failure looks unrelated, but I'm not 100% sure that I didn't introduce any hole into JIT, since the usage of Math.DivRem
is not rare.
I think I've understood how to implement it for Mono. However, currently Mono uses X64
subclass to decide 64bit, which is unsuitable for nint
. Leaving for the Mono team to implement it.
Nice! However, It feels weird to me to expose it as a public API - I think it's a trivial (not in terms of this implementation, but in general) thing users expect JIT to handle like other compilers do
To clarify myself - it's fine and probably even great to have GT_INTRINSIC in JIT that this PR does so then we can recognize the div/mod pattern and replace with this intrinsic - I was only concerning about the C# API.
I'm looking forward to seeing this merged
It's capable to do 128bit division, though maybe not useful.
The lack of CDQ/CQO method makes the potential result worse than recognizing the div then mod pattern. Is it reliable to recognize >>31
?
Nice! However, It feels weird to me to expose it as a public API - I think it's a trivial (not in terms of this implementation, but in general) thing users expect JIT to handle like other compilers do
As from the API review, it is being exposed because x86/x64 div
/idiv
are special and are quite a bit more complex than just doing div + rem
. They also allow 128 / 64
, 64 / 32
, 32 / 16
, and 16 / 8
operations as well as a few other specializations that make them "different enough" that it warrants a special API rather than a general xplat helper.
There are cases where the JIT itself could be improved here as well, certainly, but some of the specializations can be quite complex and this is something that even C/C++ compilers provide as part of their intrinsics repertoires.
I can give this a general review, but its also going to require a review from someone in @dotnet/jit-contrib
@huoyaoyuan, could you resolve the merge conflict and ensure this is up to date with the latest main? It's been a while and we need to ensure CI results are current.
This still needs review from @dotnet/jit-contrib
Ping for @dotnet/jit-contrib - can we get this in .NET 7?
@kunalspathak, Libraries team needs a code review from JIT team. This is for .NET 7. PTAL. cc @dotnet/jit-contrib.
@kunalspathak, Libraries team needs a code review from JIT team. This is for .NET 7. PTAL.
I don't think we should merge this in .NET 7. There are many unknowns in the code and some, even I am not familiar with and having them right now feels risky. We can certainly revise and take it once we have a branch for .NET 8.
Failures related to https://github.com/dotnet/runtime/issues/76041
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Issue Details
Closes #27292.
Some part of code was took from #37928 and #64864. I haven't fully understood all the concepts indeed. Needs some JIT expert to explain them.
Not implementing for Mono because I can't understand the code, and this one has special register constraints.
Author: | huoyaoyuan |
---|---|
Assignees: | - |
Labels: |
|
Milestone: | 8.0.0 |