runtime icon indicating copy to clipboard operation
runtime copied to clipboard

Enable JitDisasm in Release

Open EgorBo opened this issue 2 years ago • 39 comments

This PR enables DOTNET_JitDisasm for Release config.

It doesn't print lcl tables, EH and gc info (for now).

Adds ~30kb to the binary size of jit.

The only noticeable impact on Release is that instrDesc struct is now 8bytes bigger (_idDebugOnlyInfo) - but I might fix it with a standalone hash-table which will be populated only when JitDisasm variable is not empty.

Example:

public class Tests
{
    static int staticField;
    string instanceField;

    public virtual void VirtualCall() {}

    public void Test(string[] args, IDisposable d)
    {
        // control flow
        if (args?.Length == 0)
        {
            Console.WriteLine("field: " + staticField);
            Console.WriteLine(instanceField);
        }
        else
        {
            // SIMD (data section)
            Console.WriteLine(Vector128.Create(1, 2, 3, 4));
        }

        // Virtual call
        VirtualCall();

        // Interface call
        d?.Dispose();

        // Jump table
        switch (args.Length)
        {
            case 1:
                Console.WriteLine("1");
                break;
            case 2:
                Console.WriteLine("2");
                break;
            case 3:
                Console.WriteLine("3");
                break;
            case 4:
                Console.WriteLine("5");
                break;
            case 6:
                Console.WriteLine("7");
                break;
        }
        Console.ReadKey();
    }
}

DOTNET_JitDisasm=Test: (DOTNET_JitDiffableDasm is 0 by default, hence, raw bytes, addresses)

; Assembly listing for method Tests:Test(ref,IDisposable):this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H
       57                   push     rdi
       56                   push     rsi
       53                   push     rbx
       4883EC30             sub      rsp, 48
       C5F877               vzeroupper
       488BF1               mov      rsi, rcx
       488BFA               mov      rdi, rdx
       498BD8               mov      rbx, r8

G_M000_IG02:                ;; offset=0013H
       4885FF               test     rdi, rdi
       743D                 je       SHORT G_M000_IG04

G_M000_IG03:                ;; offset=0018H
       837F0800             cmp      dword ptr [rdi+8], 0
       7537                 jne      SHORT G_M000_IG04
       8B0DF8B40C00         mov      ecx, dword ptr [(reloc 0x7ffd9745c06c)]
       FF15560A1100         call     [Number:Int32ToDecStr(int):String]
       488BD0               mov      rdx, rax
       48B9D820804A0C020000 mov      rcx, 0x20C4A8020D8
       488B09               mov      rcx, gword ptr [rcx]
       FF15D0100900         call     [String:Concat(String,String):String]
       488BC8               mov      rcx, rax
       FF15FFAE0F00         call     [Console:WriteLine(String)]
       488B4E08             mov      rcx, gword ptr [rsi+8]
       FF15F5AE0F00         call     [Console:WriteLine(String)]
       EB27                 jmp      SHORT G_M000_IG05

G_M000_IG04:                ;; offset=0055H
       48B978D64797FD7F0000 mov      rcx, 0x7FFD9747D678
       E8FCAEAE5F           call     CORINFO_HELP_NEWSFAST
       C5F91005E4000000     vmovupd  xmm0, xmmword ptr [reloc @RWD00]
       C5F9114008           vmovupd  xmmword ptr [rax+8], xmm0
       488BC8               mov      rcx, rax
       FF15B6AE0F00         call     [Console:WriteLine(Object)]

G_M000_IG05:                ;; offset=007AH
       488BCE               mov      rcx, rsi
       488B06               mov      rax, qword ptr [rsi]
       488B4040             mov      rax, qword ptr [rax+64]
       FF5020               call     [rax+32]Tests:VirtualCall():this
       4885DB               test     rbx, rbx
       7410                 je       SHORT G_M000_IG07

G_M000_IG06:                ;; offset=008CH
       488BCB               mov      rcx, rbx
       49BB08001E97FD7F0000 mov      r11, 0x7FFD971E0008
       41FF13               call     [r11]IDisposable:Dispose():this

G_M000_IG07:                ;; offset=009CH
       8B4F08               mov      ecx, dword ptr [rdi+8]
       FFC9                 dec      ecx
       83F905               cmp      ecx, 5
       0F877F000000         ja       G_M000_IG13
       8BC9                 mov      ecx, ecx
       488D05AD000000       lea      rax, [reloc @RWD16]
       8B0488               mov      eax, dword ptr [rax+4*rcx]
       488D1556FFFFFF       lea      rdx, G_M000_IG02
       4803C2               add      rax, rdx
       FFE0                 jmp      rax

G_M000_IG08:                ;; offset=00C2H
       48B9E020804A0C020000 mov      rcx, 0x20C4A8020E0
       488B09               mov      rcx, gword ptr [rcx]
       FF1573AE0F00         call     [Console:WriteLine(String)]
       EB52                 jmp      SHORT G_M000_IG13

G_M000_IG09:                ;; offset=00D7H
       48B9E820804A0C020000 mov      rcx, 0x20C4A8020E8
       488B09               mov      rcx, gword ptr [rcx]
       FF155EAE0F00         call     [Console:WriteLine(String)]
       EB3D                 jmp      SHORT G_M000_IG13

G_M000_IG10:                ;; offset=00ECH
       48B9F020804A0C020000 mov      rcx, 0x20C4A8020F0
       488B09               mov      rcx, gword ptr [rcx]
       FF1549AE0F00         call     [Console:WriteLine(String)]
       EB28                 jmp      SHORT G_M000_IG13

G_M000_IG11:                ;; offset=0101H
       48B9F820804A0C020000 mov      rcx, 0x20C4A8020F8
       488B09               mov      rcx, gword ptr [rcx]
       FF1534AE0F00         call     [Console:WriteLine(String)]
       EB13                 jmp      SHORT G_M000_IG13

G_M000_IG12:                ;; offset=0116H
       48B90021804A0C020000 mov      rcx, 0x20C4A802100
       488B09               mov      rcx, gword ptr [rcx]
       FF151FAE0F00         call     [Console:WriteLine(String)]

G_M000_IG13:                ;; offset=0129H
       488D4C2420           lea      rcx, bword ptr
       33D2                 xor      edx, edx
       FF158AEE0F00         call     [ConsolePal:ReadKey(bool):ConsoleKeyInfo]
       90                   nop

G_M000_IG14:                ;; offset=0137H
       4883C430             add      rsp, 48
       5B                   pop      rbx
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret

RWD00   dq      0000000200000001h, 0000000400000003h
RWD16   dd      000000AFh ; case G_M000_IG08
        dd      000000C4h ; case G_M000_IG09
        dd      000000D9h ; case G_M000_IG10
        dd      000000EEh ; case G_M000_IG11
        dd      00000116h ; case G_M000_IG13
        dd      00000103h ; case G_M000_IG12

; Total bytes of code 319

EgorBo avatar Aug 04 '22 11:08 EgorBo

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

This PR enables DOTNT_JitDisasm, DOTNET_NgenDisasm and DOTNET_JitDiffableDasm for Release config.

It doesn't print lcl tables and gc info (for now).

Adds ~30kb to the binary size of jit.

The only noticeable impact on Release is that instrDesc struct is now 8bytes bigger (_idDebugOnlyInfo) - but I might fix it with a standalone hash-table which will be populated only when JitDisasm variable is not empty.

Example:

public class Tests
{
    static int staticField;
    string instanceField;

    public virtual void VirtualCall() {}

    public void Test(string[] args, IDisposable d)
    {
        // control flow
        if (args?.Length == 0)
        {
            Console.WriteLine("field: " + staticField);
            Console.WriteLine(instanceField);
        }
        else
        {
            // SIMD (data section)
            Console.WriteLine(Vector128.Create(1, 2, 3, 4));
        }

        // Virtual call
        VirtualCall();

        // Interface call
        d?.Dispose();

        // Jump table
        switch (args.Length)
        {
            case 1:
                Console.WriteLine("1");
                break;
            case 2:
                Console.WriteLine("2");
                break;
            case 3:
                Console.WriteLine("3");
                break;
            case 4:
                Console.WriteLine("5");
                break;
            case 6:
                Console.WriteLine("7");
                break;
        }
        Console.ReadKey();
    }
}
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
IG_01:
       57                   push     rdi
       56                   push     rsi
       53                   push     rbx
       4883EC30             sub      rsp, 48
       C5F877               vzeroupper
       488BF1               mov      rsi, rcx
       488BFA               mov      rdi, rdx
       498BD8               mov      rbx, r8
IG_02:
       4885FF               test     rdi, rdi
       743D                 je       SHORT G_M000_IG04
IG_03:
       837F0800             cmp      dword ptr [rdi+8], 0
       7537                 jne      SHORT G_M000_IG04
       8B0DF8B40C00         mov      ecx, dword ptr [(reloc 0x7ffd2d2dc06c)]
       FF15560A1100         call     [Number:Int32ToDecStr(int):String]
       488BD0               mov      rdx, rax
       48B9D820C0EF74020000 mov      rcx, 0x274EFC020D8
       488B09               mov      rcx, gword ptr [rcx]
       FF15D0100900         call     [String:Concat(String,String):String]
       488BC8               mov      rcx, rax
       FF15FFAE0F00         call     [Console:WriteLine(String)]
       488B4E08             mov      rcx, gword ptr [rsi+8]
       FF15F5AE0F00         call     [Console:WriteLine(String)]
       EB27                 jmp      SHORT G_M000_IG05
IG_04:
       48B978D62F2DFD7F0000 mov      rcx, 0x7FFD2D2FD678
       E8FCAEAF5F           call     CORINFO_HELP_NEWSFAST
       C5F91005E4000000     vmovupd  xmm0, xmmword ptr [reloc @RWD00]
       C5F9114008           vmovupd  xmmword ptr [rax+8], xmm0
       488BC8               mov      rcx, rax
       FF15B6AE0F00         call     [Console:WriteLine(Object)]
IG_05:
       488BCE               mov      rcx, rsi
       488B06               mov      rax, qword ptr [rsi]
       488B4040             mov      rax, qword ptr [rax+64]
       FF5020               call     [rax+32]Tests:VirtualCall():this
       4885DB               test     rbx, rbx
       7410                 je       SHORT G_M000_IG07
IG_06:
       488BCB               mov      rcx, rbx
       49BB0800062DFD7F0000 mov      r11, 0x7FFD2D060008
       41FF13               call     [r11]IDisposable:Dispose():this
IG_07:
       8B4F08               mov      ecx, dword ptr [rdi+8]
       FFC9                 dec      ecx
       83F905               cmp      ecx, 5
       0F877F000000         ja       G_M000_IG13
       8BC9                 mov      ecx, ecx
       488D05AD000000       lea      rax, [reloc @RWD16]
       8B0488               mov      eax, dword ptr [rax+4*rcx]
       488D1556FFFFFF       lea      rdx, G_M000_IG02
       4803C2               add      rax, rdx
       FFE0                 jmp      rax
IG_08:
       48B9E020C0EF74020000 mov      rcx, 0x274EFC020E0
       488B09               mov      rcx, gword ptr [rcx]
       FF1573AE0F00         call     [Console:WriteLine(String)]
       EB52                 jmp      SHORT G_M000_IG13
IG_09:
       48B9E820C0EF74020000 mov      rcx, 0x274EFC020E8
       488B09               mov      rcx, gword ptr [rcx]
       FF155EAE0F00         call     [Console:WriteLine(String)]
       EB3D                 jmp      SHORT G_M000_IG13
IG_10:
       48B9F020C0EF74020000 mov      rcx, 0x274EFC020F0
       488B09               mov      rcx, gword ptr [rcx]
       FF1549AE0F00         call     [Console:WriteLine(String)]
       EB28                 jmp      SHORT G_M000_IG13
IG_11:
       48B9F820C0EF74020000 mov      rcx, 0x274EFC020F8
       488B09               mov      rcx, gword ptr [rcx]
       FF1534AE0F00         call     [Console:WriteLine(String)]
       EB13                 jmp      SHORT G_M000_IG13
IG_12:
       48B90021C0EF74020000 mov      rcx, 0x274EFC02100
       488B09               mov      rcx, gword ptr [rcx]
       FF151FAE0F00         call     [Console:WriteLine(String)]
IG_13:
       488D4C2420           lea      rcx, bword ptr
       33D2                 xor      edx, edx
       FF158AEE0F00         call     [ConsolePal:ReadKey(bool):ConsoleKeyInfo]
       90                   nop
IG_14:
       4883C430             add      rsp, 48
       5B                   pop      rbx
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret
RWD00   dq      0000000200000001h, 0000000400000003h
RWD16   dd      000000AFh ; case IG_08
        dd      000000C4h ; case IG_09
        dd      000000D9h ; case IG_10
        dd      000000EEh ; case IG_11
        dd      00000116h ; case IG_13
        dd      00000103h ; case IG_12
; Total bytes of code 319
Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

msftbot[bot] avatar Aug 04 '22 11:08 msftbot[bot]

This would be super helpful.

stephentoub avatar Aug 04 '22 11:08 stephentoub

Does this PR also cover crossgen2?

hez2010 avatar Aug 04 '22 11:08 hez2010

Does this PR also cover crossgen2?

Yes works for R2R. And, probably, for NativeAOT as well but afair it redirects stdout cc @MichalStrehovsky

EgorBo avatar Aug 04 '22 11:08 EgorBo

One noticeable advantage over sharplab for end-users is the ability to print names for helper calls, e.g.

public class C
{
    object o;
    
    public void M(object[] a)
    {
        a[0] = 42;
        o = a;
    }
}

Sharplab:

image

JitDisasm:

; Assembly listing for method C:M(ref):this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data

G_M000_IG01:              ;; offset=0000H
       57                   push     rdi
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488BF9               mov      rdi, rcx
       488BF2               mov      rsi, rdx

G_M000_IG02:              ;; offset=000CH
       48B958E4DC2CFD7F0000 mov      rcx, 0x7FFD2CDCE458
       E835AFB15F           call     CORINFO_HELP_NEWSFAST
       C740082A000000       mov      dword ptr [rax+8], 42
       4C8BC0               mov      r8, rax
       488BCE               mov      rcx, rsi
       33D2                 xor      edx, edx
       E891F4FFFF           call     CORINFO_HELP_ARRADDR_ST
       488D4F08             lea      rcx, bword ptr [rdi+8]
       488BD6               mov      rdx, rsi
       E875F4E2FF           call     CORINFO_HELP_ASSIGN_REF
       90                   nop

G_M000_IG03:              ;; offset=003CH
       4883C428             add      rsp, 40
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret

; Total bytes of code 67

Perhaps, it will be useful for sharplab too

EgorBo avatar Aug 04 '22 12:08 EgorBo

@EgorBo do you plan to merge this to .NET 7?

JulieLeeMSFT avatar Aug 04 '22 13:08 JulieLeeMSFT

@EgorBo do you plan to merge this to .NET 7?

It probably does not meet the bar but I'd love to merge it in .NET 7.0, but no strong opinion

EgorBo avatar Aug 04 '22 13:08 EgorBo

It probably does not meet the bar but I'd love to merge it in .NET 7.0, but no strong opinion

FWIW, I would love that, too :) (as long as there's no meaningful perf regressions associated with it and there are no concerns about the data this exposes).

stephentoub avatar Aug 04 '22 13:08 stephentoub

no concerns about the data this exposes

Yep, we discussed this one too and came to conclusion that there is nothing to be afraid of and our competitors provide this ability too

EgorBo avatar Aug 04 '22 13:08 EgorBo

Does this PR also cover crossgen2?

Yes works for R2R. And, probably, for NativeAOT

NativeAOT compiler and crossgen use exact same infrastructure. If it works for one, it should work for the other one too.

You have to specify the option on the command line instead of env variable. Is that correct?

DOTNET_NgenDisasm

We should not be injecting new life into NGen name. Can we rename this so that it does not have NGen in the name?

I am wondering why we need both NgenDisasm and JitDisasm. Can we have just one that kicks in everywhere?

jkotas avatar Aug 04 '22 13:08 jkotas

You have to specify the option on the command line instead of env variable. Is that correct?

Correct, e.g. --codegenopt NgenDisasm="Test"

We should not be injecting new life into NGen name. Can we rename this so that it does not have NGen in the name?

I fully agree, should I still keep NgenDisasm for backward compatibility?

EgorBo avatar Aug 04 '22 13:08 EgorBo

I fully agree, should I still keep NgenDisasm for backward compatibility?

There is no backward compatibility to be worried about. We should delete all configs that start with NGen and use the Jit configs everywhere.

If we have any infra that use the NGen configs (I do not see any), it needs to be updated.

I see a few places that clear COMPlus_Ngen* in superpmi. They should be deleted. They are not doing anything useful anyway since NGen configs are not read from the environment.

jkotas avatar Aug 04 '22 13:08 jkotas

There is no backward compatibility to be worried about.

https://godbolt.org/ relies on it to print ASM for C#, but I assume it will only benefit from moving to release .NET as it won't have to compile .NET from sources and use public previews/daily builds of Release .NET instead cc @hez2010

EgorBo avatar Aug 04 '22 13:08 EgorBo

There is no backward compatibility to be worried about.

https://godbolt.org/ relies on it to print ASM for C#.

There is nothing to worry about. We can definitely setup a new net7.0 config for godbolt while keeping net6.0 as is.

hez2010 avatar Aug 04 '22 23:08 hez2010

@dotnet/jit-contrib Is ~+0.1% TP regression considered acceptable? (up to 0.14% for arm64 and up to 0.07% on x64)

It happens because of _idDebugOnlyInfo field in InstrDesc (makes it 16 bytes instead of 8) - I tried to mitigate it via different ways:

  1. Introduced a standalone hash-table JitHashTable<InstrDesc*, DebugInfo> but it wasn't clear what to use for keys: pointers didn't work well because in several places we do memcpy for IGs so they quickly become invalid (and there are some other tricks where we copy InstrDesc by value). Or use InstrDesc by value itself as a key - also doesn't work since we have last-chance "peepholes" where we modify parts of InstrDesc in various places

  2. Make InstrDesc struct variable size (allocate more space in JitDisasm for each) - it turned out be a very invasive change (~600 LOC) where I still wasn't sure I did it correctly and didn't forget anything (and had to propagate Compiler object in many places since JitTls::GetCompiler is not fast in Release)

So I think it's better to keep it as is - it's safer. I made a few clean ups to reduce the TP regression but can't get past 0.1%

EgorBo avatar Aug 05 '22 20:08 EgorBo

What does it translate to in terms of memstats (define MEASURE_MEM_ALLOC in jit.h and enable DOTNET_JitMemStats=1) for libraries.pmi/crossgen2 collections?

jakobbotsch avatar Aug 05 '22 20:08 jakobbotsch

What is the memory regression (especially for MinOpts)?

Edit: heh, seems like we've commented in parallel :).

Edit: I did some measurements with x64 CoreLib:

Regression with optimizations enabled:  1.190%
Regression with optimizations disabled: 2.591%

SingleAccretion avatar Aug 05 '22 20:08 SingleAccretion

Does the JIT dll import table change? (I.e., are there any new dependencies in the Release build?)

BruceForstall avatar Aug 05 '22 21:08 BruceForstall

For 64-bit platforms, if you can't use the instrDesc* as a key, you could change _idDebugOnlyInfo to a uint32_t index and use that as a key to a hash table or (dynamic) array, which would reduce per-instrDesc since increase from 8 to 4 bytes.

BruceForstall avatar Aug 05 '22 21:08 BruceForstall

I personally would not lose sleep over 0.1% TP and 2.6% more memory usage for jitting (that is recycled anyway) given how useful this is going to be for us. I am a little surprised that it was an invasive change to allocate the debug info pointer before the instrDesc when JitDisasm is enabled.

jakobbotsch avatar Aug 05 '22 21:08 jakobbotsch

0.1% TP

I assume that this is measured with optimizations on. What is the TP regression for minopts?

jkotas avatar Aug 05 '22 21:08 jkotas

Thanks for the detailed feedback!

Edit: I did some measurements with x64 CoreLib:

Regression with optimizations enabled:  1.190%
Regression with optimizations disabled: 2.591%

For me it's 1.1% memory regression (allocated per method by jit on average) for both optimized and non-optimized (but maybe I messed up with the non-optimized one)

Does the JIT dll import table change? (I.e., are there any new dependencies in the Release build?)

No new dependencies according to dumpbin /dependents clrjit.dll

For 64-bit platforms, if you can't use the instrDesc* as a key, you could change _idDebugOnlyInfo to a uint32_t index and use that as a key to a hash table or (dynamic) array, which would reduce per-instrDesc since increase from 8 to 4 bytes.

Nice idea! Should reduce jit's memory usage on 64bit indeed, will try

I assume that this is measured with optimizations on. What is the TP regression for minopts?

I've just tested - it's the same regardless of -jitoption JITMinOpts=1 (as I expected since memory layout doesn't depend on it)

I am a little surprised that it was an invasive change to allocate the debug info pointer before the instrDesc when JitDisasm is enabled.

70 places where we use SMALL_IDSC_SIZE, sizeof(InstrDesc) and sizeof(emitter::InstrDesc) and places where memory is casted to InstrDesc* (in all of those 70 places we have to propagate Compiler object)

EgorBo avatar Aug 05 '22 22:08 EgorBo

I've just tested - it's the same regardless of -jitoption JITMinOpts=1 (as I expected since memory layout doesn't depend on it)

Did you not run into problems with missing info when forcing min opts? I would expect that to be the case. I collected some by-tier stats for asp.net which I believe is the only one that has a significant number of tier 0 contexts (I really want this breakdown in the superpmi-diffs output):

Num contexts Base instructions Diff instructions PDIFF
Tier 0 66223 25145093032 25183708811 0.15%
Tier 1 50690 105217143245 105259661798 0.04%

Changeset for this: https://github.com/dotnet/runtime/commit/7da2080b11023ac83e7b2942389be11c2012df44

70 places where we use SMALL_IDSC_SIZE, sizeof(InstrDesc) and sizeof(emitter::InstrDesc) and places where memory is casted to InstrDesc* (in all of those 70 places we have to propagate Compiler object)

Hmm, naively I would not expect us to have to change these places. Let me try to play around with what I was thinking...

jakobbotsch avatar Aug 05 '22 22:08 jakobbotsch

(I really want this breakdown in the superpmi-diffs output)

Seems like a good idea. I opened https://github.com/dotnet/runtime/issues/73505 to track. Maybe there are other pivots as well? PGO versus non-PGO, etc.?

BruceForstall avatar Aug 05 '22 23:08 BruceForstall

Maybe there are other pivots as well? PGO versus non-PGO, etc.?

Yeah, I think we can find a lot of stuff. I have it on my quality week list to look into this.

Hmm, naively I would not expect us to have to change these places. Let me try to play around with what I was thinking...

Something like this: https://github.com/dotnet/runtime/commit/4be3ac2ec4d6b5b62dd7c8df8affe3a285b8a2e4

Survives disassembly for a simple "hello world" in release and survives asp.net superpmi replay with both checked and release JITs.

(Edit: FWIW, this also has 0.1% TP cost in terms of instructions executed :-))

jakobbotsch avatar Aug 05 '22 23:08 jakobbotsch

Maybe there are other pivots as well? PGO versus non-PGO, etc.?

Yeah, I think we can find a lot of stuff. I have it on my quality week list to look into this.

Hmm, naively I would not expect us to have to change these places. Let me try to play around with what I was thinking...

Something like this: 4be3ac2

Survives disassembly for a simple "hello world" in release and survives asp.net superpmi replay with both checked and release JITs.

(Edit: FWIW, this also has 0.1% TP cost in terms of instructions executed :-))

@jakobbotsch Awesome! Presumably, it takes less memory 🙂

EgorBo avatar Aug 06 '22 00:08 EgorBo

No memory regressions with your patch, awesome! 🎉 https://www.diffchecker.com/XUlCiSbK (crossgenning corelib without opts)

EgorBo avatar Aug 06 '22 09:08 EgorBo

@jakobbotsch hm.. superpmi-diffs reports worse TP regression for this change was: https://dev.azure.com/dnceng/public/_build/results?buildId=1926081&view=ms.vss-build-web.run-extensions-tab now: https://dev.azure.com/dnceng/public/_build/results?buildId=1927400&view=ms.vss-build-web.run-extensions-tab arm64: 0.14% -> 0.20% x64: 0.07% -> 0.13%

is it a normal fluctuation or not but still better than memory usage regression?

EgorBo avatar Aug 06 '22 12:08 EgorBo

is it a normal fluctuation or not but still better than memory usage regression?

I don't think it's noise, it probably is executing that many more instructions. However I don't think the number of instructions executed here is going to translate to the same increase in CPU expenditure since these instructions are going to be very cheap (basically some extra add instructions, assuming the field gets hoisted). Somewhat similar to @SingleAccretion's situation in #71597. I can try to measure some confidence intervals for JIT time for libraries.pmi instead of instructions executed and see if we can see any difference there, though I expect the variance is so high that it's probably not going to give much insight.

jakobbotsch avatar Aug 06 '22 12:08 jakobbotsch

@kunalspathak this code in emitLoopAlign looks suspicious to me: https://github.com/dotnet/runtime/blob/2db51aa5539ad9e444d33574459374680a4ba098/src/coreclr/jit/emit.cpp#L5192-L5217

emitNewInstrAlign can create new IGs for other reasons than emitForceNewIG, for example if we do not have enough space in the IG for a new instrDesc, or if we have reached the instruction count limit (255). It happens with this change in a few SPMI contexts.

I have pushed a fix in https://github.com/dotnet/runtime/pull/73365/commits/7255863f712d97d180a873fd23f139cbc4fd9ecb, please let me know if you see any problems with this. (EDIT: I've pulled this fix into #73518 instead. EDIT 2: I realized I just needed to update emitCheckAlignFitInCurIG instead)

jakobbotsch avatar Aug 06 '22 13:08 jakobbotsch

BTW, do we want to make COMPlus_JitStdOutFile available in release too? It will potentially allow us to use release jits with SPMI asmdiffs/jit-diff.

jakobbotsch avatar Aug 06 '22 14:08 jakobbotsch

Oh, thanks for taking care of it!!

BTW, do we want to make COMPlus_JitStdOutFile available in release too?

Can we do that in follow-ups? There are other optional knobs we might consider porting

EgorBo avatar Aug 07 '22 11:08 EgorBo

CI failure is unrelated: https://github.com/dotnet/runtime/issues/73247

EgorBo avatar Aug 07 '22 12:08 EgorBo

Can we do that in follow-ups?

Sounds fine to me (I also looked into it a bit and it might not be that trivial). We should probably run most relevant pipelines on this.

jakobbotsch avatar Aug 07 '22 13:08 jakobbotsch

/azp list

EgorBo avatar Aug 07 '22 13:08 EgorBo

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr gcstress0x3-gcstress0xc, runtime-coreclr r2r, runtime-coreclr crossgen2, runtime-libraries-coreclr outerloop, runtime-coreclr crossgen2 outerloop, Antigen, Fuzzlyn, runtime-extra-platforms

EgorBo avatar Aug 07 '22 13:08 EgorBo

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines[bot] avatar Aug 07 '22 13:08 azure-pipelines[bot]

We should probably run most relevant pipelines on this.

Note that we do little to no testing with Release builds (except the checked/release asm diffs pipeline), so you'll want to think about whether one-off (manual) Release build testing is warranted.

BruceForstall avatar Aug 07 '22 16:08 BruceForstall

We should probably run most relevant pipelines on this.

Note that we do little to no testing with Release builds (except the checked/release asm diffs pipeline), so you'll want to think about whether one-off (manual) Release build testing is warranted.

Do you mean Release when JitDisasm is set? I've tested it locally with DOTNET_JitDisasm=* (all) for JIT (running a large app), crossgen prejitting corelib and NativeAOT compiling a console app - no asserts/crashes. When it's not defined (by default) it's well tested on CI as is.

Btw, checked that it works on NativeAOT

EgorBo avatar Aug 08 '22 09:08 EgorBo

Outerloop pipelines failures seem unrelated:

  • Fuzzlyn: same here, e.g. x86 failure is https://github.com/dotnet/runtime/issues/70786
  • coreclr Pri0 Runtime Tests Run Linux arm checked failure is https://github.com/dotnet/runtime/issues/73247
  • runtime-coreclr crossgen2 outerloop failures are infra-related e.g. error : A call to an Azure DevOps api returned 401, which may indicate a bad 'System.AccessToken' value.
  • runtime-coreclr gcstress0x3-gcstress0xc are known issues, e.g. https://github.com/dotnet/runtime/issues/69092#issuecomment-1207550785
  • runtime-coreclr outerloop failures seem to be infra-related '"C:\h\w\A18C08B3\p\dotnet.exe"' is not recognized as an internal or external command

EgorBo avatar Aug 08 '22 09:08 EgorBo

I have started a non-PR run of runtime here, which will get us some more coverage for release JIT with the libraries tests.

The change is somewhat low risk for release since when m_debugInfoSize = 0, the layout just collapses back to what it was before. So even if we forgot to account for it somewhere, it is unlikely that will cause problems for release non-disasm builds.

My main concern would be that we have an idDebugOnlyInfo() use somewhere that is not conditioned on the debug info being present, but I've audited the uses and wasn't able to find any way that could happen.

jakobbotsch avatar Aug 08 '22 09:08 jakobbotsch

My main concern would be that we have an idDebugOnlyInfo() use somewhere that is not conditioned on the debug info being present, but I've audited the uses and wasn't able to find any way that could happen.

Right, I audited those too (and inserted a failfast if we entered them with .disAsm = false) locally

EgorBo avatar Aug 08 '22 09:08 EgorBo

@dotnet/jit-contrib @jakobbotsch @BruceForstall PTAL

EgorBo avatar Aug 08 '22 15:08 EgorBo

Seems like we should allow users to specify JitDiffableDasm too?

If you are live debugging it would be nice to see the actual addresses and such.

https://github.com/dotnet/runtime/blob/e71a9583b4d6c9bd97edd87cda7f98f232f63530/src/coreclr/jit/disasm.cpp#L1461-L1467

Might also be nice to be able to control JitDasmWithAddress and JitDasmWithAlignmentBoundaries and perhaps even JitDisasmWithGC and JitDisasmWithDebugInfo.

There are a whole bunch more like this: unwind, eh, ... but they might take a lot more work.

AndyAyersMS avatar Aug 08 '22 16:08 AndyAyersMS

@EgorBo Can we assume that it will always start with "; Assembly listing for method " and end with "; Total bytes of code "?

omariom avatar Aug 08 '22 16:08 omariom

Seems like we should allow users to specify JitDiffableDasm too?

If you are live debugging it would be nice to see the actual addresses and such.

https://github.com/dotnet/runtime/blob/e71a9583b4d6c9bd97edd87cda7f98f232f63530/src/coreclr/jit/disasm.cpp#L1461-L1467

Might also be nice to be able to control JitDasmWithAddress and JitDasmWithAlignmentBoundaries and perhaps even JitDisasmWithGC and JitDisasmWithDebugInfo.

There are a whole bunch more like this: unwind, eh, ... but they might take a lot more work.

Yes, I had to draw the line somewhere for the initial impl since, obviously, supporting all of them is a lot more work, I am going to take a look at JitDiffableDasm separately if you don't mind. JitDasmWithAlignmentBoundaries is also nice to have - it draws nice lines 😄

EgorBo avatar Aug 08 '22 17:08 EgorBo

@EgorBo Can we assume that it will always start with "; Assembly listing for method " and end with "; Total bytes of code "?

Good question, I assume some 3rd party tools are going to depend on it (parse) and we should not ever change it So I personally is fine with the current format, if anyone has better ideas - feel free to change now

EgorBo avatar Aug 08 '22 17:08 EgorBo

@EgorBo Can we assume that it will always start with "; Assembly listing for method " and end with "; Total bytes of code "?

Good question, I assume some 3rd party tools are going to depend on it (parse) and we should not ever change it So I personally is fine with the current format, if anyone has better ideas - feel free to change now

There could be a separate env var that'd enable a special stable parsable output, something like git does for tools invoking it.

MichalPetryka avatar Aug 08 '22 17:08 MichalPetryka

Started to review, but it seems we are not printing PerfScore yet?

kunalspathak avatar Aug 09 '22 17:08 kunalspathak

It would be nice if the number of places doing + m_debugInfoSize were more minimized, via wrapper functions or the like.

I agree with @BruceForstall . There are too many places where we are doing + m_debugInfoSize and it is easy to forget doing that in future. A wrapper function will be helpful.

kunalspathak avatar Aug 09 '22 17:08 kunalspathak

I pushed a commit that adds emitFirstInstrDesc and emitAdvanceInstrDesc and uses these in place of the manual castto/buffer accesses, which consolidates most uses of m_debugInfoSize.

jakobbotsch avatar Aug 09 '22 18:08 jakobbotsch

I pushed a commit that adds emitFirstInstrDesc and emitAdvanceInstrDesc and uses these in place of the manual castto/buffer accesses, which consolidates most uses of m_debugInfoSize.

The functions are a nice improvement.

BruceForstall avatar Aug 09 '22 18:08 BruceForstall

Does the JIT dll import table change? (I.e., are there any new dependencies in the Release build?)

We get two new IAT entries on Windows: __stdio_common_vsprintf_s and strnlen.

jakobbotsch avatar Aug 09 '22 19:08 jakobbotsch

Merging now. We're going to keep eyes on TP-related benchmarks (e.g. "Crossgen2 Throughput Trends" page) in coming weeks.

Kudos to @jakobbotsch for mitigating all the initial TP/Memory regressions 🎉

EgorBo avatar Aug 10 '22 14:08 EgorBo