runtime icon indicating copy to clipboard operation
runtime copied to clipboard

Implement DivRem intrinsic for X86

Open huoyaoyuan opened this issue 2 years ago • 19 comments

Closes #27292.

Some part of code was took from #37928 and #64864. I haven't fully understood all the concepts indeed. Needs some JIT expert to explain them.

Not implementing for Mono because I can't understand the code, and this one has special register constraints.

huoyaoyuan avatar Mar 13 '22 09:03 huoyaoyuan

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #27292.

Some part of code was took from #37928 and #64864. I haven't fully understood all the concepts indeed. Needs some JIT expert to explain them.

Not implementing for Mono because I can't understand the code, and this one has special register constraints.

Author: huoyaoyuan
Assignees: -
Labels:

area-CodeGen-coreclr, new-api-needs-documentation

Milestone: -

msftbot[bot] avatar Mar 13 '22 09:03 msftbot[bot]

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #27292.

Some part of code was took from #37928 and #64864. I haven't fully understood all the concepts indeed. Needs some JIT expert to explain them.

Not implementing for Mono because I can't understand the code, and this one has special register constraints.

Author: huoyaoyuan
Assignees: -
Labels:

area-System.Runtime.Intrinsics, new-api-needs-documentation

Milestone: -

msftbot[bot] avatar Mar 13 '22 09:03 msftbot[bot]

Code gen for sample method:

        public static ulong XL(ulong a, ulong b)
        {
            (ulong q, ulong r) = Math.DivRem(a, b);
            return q + r;
        }
G_M52626_IG01:
       mov      qword ptr [rsp+10H], rdx
						;; bbWeight=1    PerfScore 1.00

G_M52626_IG02:
       xor      edx, edx
       mov      rax, rcx
       div      rdx:rax, qword ptr [rsp+10H]
       add      rax, rdx
						;; bbWeight=1    PerfScore 61.75

G_M52626_IG03:
       ret      
						;; bbWeight=1    PerfScore 1.00
; Total bytes of code: 19

In one commit it uses R8 instead of stack, but fails with register confliction in optimized code.

It can be worse if RDX is used in calling convention. For the following method:

        public long DivRemInt64(long a, long b)
        {
            (long q, long r) = Math.DivRem(a, b);
            return q + r;
        }

It saves a copy to stack:

       mov       [rsp+10],rdx
       sar       rdx,3F
       mov       rax,[rsp+10]
       idiv      r8
       add       rax,rdx
       ret
; Total bytes of code 21

And causes micro benchmark regresses. But real world sample can differ by register assignation.

huoyaoyuan avatar Mar 13 '22 10:03 huoyaoyuan

Do I need to implement for Mono in this PR or create an issue to track it? This is an already implemented ISA and used in corelib. Lack of implementation would cause existing tests to fail. /cc @vargaz

huoyaoyuan avatar Mar 13 '22 10:03 huoyaoyuan

Maybe put the usage inside an #ifdef CORECLR for now and disable the relevant tests for mono ?

vargaz avatar Mar 13 '22 15:03 vargaz

I can't prevent LSRA from allocating op3 into EAX, and it causes random assertion failures, no matter using DelayFree or not. If I've understood correctly, both EAX and EDX are exposed by BuildDef, so there's no need to kill them right?

@dotnet/jit-contrib I need some help here.

Edit: I misunderstood RegNum. In case the operand comes from method returned RAX, it's correctly spilled into a temp, but the RegNum of the operand keeps RAX. Realized it after viewing JIT dump in deep.

huoyaoyuan avatar Mar 14 '22 16:03 huoyaoyuan

All the XARCH tests passed. ARM64 failure looks like a timeout. Marking as ready.

huoyaoyuan avatar Mar 15 '22 05:03 huoyaoyuan

The failure looks unrelated, but I'm not 100% sure that I didn't introduce any hole into JIT, since the usage of Math.DivRem is not rare.

huoyaoyuan avatar Mar 15 '22 11:03 huoyaoyuan

I think I've understood how to implement it for Mono. However, currently Mono uses X64 subclass to decide 64bit, which is unsuitable for nint. Leaving for the Mono team to implement it.

huoyaoyuan avatar Mar 16 '22 15:03 huoyaoyuan

Nice! However, It feels weird to me to expose it as a public API - I think it's a trivial (not in terms of this implementation, but in general) thing users expect JIT to handle like other compilers do

EgorBo avatar Mar 16 '22 15:03 EgorBo

To clarify myself - it's fine and probably even great to have GT_INTRINSIC in JIT that this PR does so then we can recognize the div/mod pattern and replace with this intrinsic - I was only concerning about the C# API.

I'm looking forward to seeing this merged

EgorBo avatar Mar 16 '22 15:03 EgorBo

It's capable to do 128bit division, though maybe not useful. The lack of CDQ/CQO method makes the potential result worse than recognizing the div then mod pattern. Is it reliable to recognize >>31?

huoyaoyuan avatar Mar 16 '22 15:03 huoyaoyuan

Nice! However, It feels weird to me to expose it as a public API - I think it's a trivial (not in terms of this implementation, but in general) thing users expect JIT to handle like other compilers do

As from the API review, it is being exposed because x86/x64 div/idiv are special and are quite a bit more complex than just doing div + rem. They also allow 128 / 64, 64 / 32, 32 / 16, and 16 / 8 operations as well as a few other specializations that make them "different enough" that it warrants a special API rather than a general xplat helper.

There are cases where the JIT itself could be improved here as well, certainly, but some of the specializations can be quite complex and this is something that even C/C++ compilers provide as part of their intrinsics repertoires.

tannergooding avatar Mar 16 '22 15:03 tannergooding

I can give this a general review, but its also going to require a review from someone in @dotnet/jit-contrib

tannergooding avatar Mar 23 '22 21:03 tannergooding

@huoyaoyuan, could you resolve the merge conflict and ensure this is up to date with the latest main? It's been a while and we need to ensure CI results are current.

This still needs review from @dotnet/jit-contrib

tannergooding avatar May 19 '22 15:05 tannergooding

Ping for @dotnet/jit-contrib - can we get this in .NET 7?

huoyaoyuan avatar Jul 14 '22 02:07 huoyaoyuan

@kunalspathak, Libraries team needs a code review from JIT team. This is for .NET 7. PTAL. cc @dotnet/jit-contrib.

JulieLeeMSFT avatar Jul 19 '22 19:07 JulieLeeMSFT

@kunalspathak, Libraries team needs a code review from JIT team. This is for .NET 7. PTAL.

I don't think we should merge this in .NET 7. There are many unknowns in the code and some, even I am not familiar with and having them right now feels risky. We can certainly revise and take it once we have a branch for .NET 8.

kunalspathak avatar Jul 20 '22 05:07 kunalspathak

Failures related to https://github.com/dotnet/runtime/issues/76041

kunalspathak avatar Sep 23 '22 16:09 kunalspathak

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #27292.

Some part of code was took from #37928 and #64864. I haven't fully understood all the concepts indeed. Needs some JIT expert to explain them.

Not implementing for Mono because I can't understand the code, and this one has special register constraints.

Author: huoyaoyuan
Assignees: -
Labels:

area-CodeGen-coreclr, new-api-needs-documentation

Milestone: 8.0.0

msftbot[bot] avatar Oct 31 '22 16:10 msftbot[bot]