ILSpy icon indicating copy to clipboard operation
ILSpy copied to clipboard

Request for advice in decompiling older Mono assemblies

Open dongresource opened this issue 3 years ago • 8 comments

Details

Hi,

A friend and I are trying to iron out some decompilation errors with a C# DLL compiled with Mono 1.2.5. We understand that you don't target versions this old so we aren't treating this as a bug report, but more as a request for comment. We don't have too much experience with C# or the CLR, so we're hoping you could give us some advice on how to best modify the existing Transforms and/or write our own ones that correctly match the instruction patterns we're looking for without being so broad as to cause erroneous matches elsewhere in the assembly.

We've already made a specialized fork of ILSpy specifically for our use case: https://github.com/gsemaj/FFSpy. Feel free to pull in any changes on the off chance you find them general enough to be useful.

The first problem we tackled was related to https://github.com/icsharpcode/ILSpy/issues/1893. We've gotten that working pretty well, as far as we can tell. We tweaked the pattern-matching logic in YieldReturnDecompiler so it properly matches the instructions Mono 1.2.5 emits. (in a545841a37f6b01bd3da3baf76717b67c10fca5c and 3af69ae06989406c697938a94569111410143ee8)

We forced the RemoveDeadStores setting to true, as we had a lot of those. (in e0b68a814209399bec580da77c12f09c040c8853)

And most recently, we've added a pass that gets rid of an issue where default constructors for object fields would erroneously get decompiled into variables being assigned to themselves, which would cause compilation issues if the decompiler decided to hoist those assignments outside of their parent constructor into static initializers. (in 76561b1b8024dbbc2cf5f48b2d9490f42d34ddd9)

The main problem we're dealing with right now is a failure to detect a construct where the compiler emits a CompilerGenerated class as a sort of proxy for this, and then proceeds to access object fields through it instead of doing it directly. This causes compilation to fail as the names of the compiler generated objects aren't valid identifiers and their definitions have been (correctly) elided.

Erroneous output

Here's an example snippet of the erroneously decompiled C#. c__CompilerGenerated79 is the "this proxy".

private void ValueColor()
{
	<>c__CompilerGenerated79 <>c__CompilerGenerated = new <>c__CompilerGenerated79(this);
	Color color = <>c__CompilerGenerated.<80:<>THIS>.colorW;
	<>c__CompilerGenerated.<80:<>THIS>.colorDod = color;
	color = color;
	<>c__CompilerGenerated.<80:<>THIS>.colorDef = color;
	color = color;
	<>c__CompilerGenerated.<80:<>THIS>.colorPow = color;
	<>c__CompilerGenerated.<80:<>THIS>.colorAcc = color;
	// snip

Input code

The corresponding IL looks like this:

.method private hidebysig 
    instance void ValueColor () cil managed 
{
    // Method begins at RVA 0x56a3c
    // Code size 1016 (0x3f8)
    .maxstack 12
    .locals init (
        [0] class EquipPopup/'<>c__CompilerGenerated79',
        [1] class cnEvent,
        [2] class cnItemDisplayInfo,
        [3] class cnItemDisplayInfo,
        [4] class InventoryManagerScript/PlayerInventoryItem,
        [5] valuetype [UnityEngine]UnityEngine.Color
    )

    IL_0000: nop
    IL_0001: ldc.i4 227
    IL_0006: pop
    IL_0007: nop
    IL_0008: ldarg.0
    IL_0009: newobj instance void EquipPopup/'<>c__CompilerGenerated79'::.ctor(class EquipPopup)
    IL_000e: stloc.0
    IL_000f: ldloc.0
    IL_0010: ldfld class EquipPopup EquipPopup/'<>c__CompilerGenerated79'::'<80:<>THIS>'
    IL_0015: ldfld valuetype [UnityEngine]UnityEngine.Color EquipPopup::colorW
    IL_001a: stloc.s 5
    IL_001c: ldloc.0
    IL_001d: ldfld class EquipPopup EquipPopup/'<>c__CompilerGenerated79'::'<80:<>THIS>'
    IL_0022: ldloc.s 5
    IL_0024: stfld valuetype [UnityEngine]UnityEngine.Color EquipPopup::colorDod
    IL_0029: ldloc.s 5
    IL_002b: stloc.s 5
    IL_002d: ldloc.0
    IL_002e: ldfld class EquipPopup EquipPopup/'<>c__CompilerGenerated79'::'<80:<>THIS>'
    IL_0033: ldloc.s 5
    IL_0035: stfld valuetype [UnityEngine]UnityEngine.Color EquipPopup::colorDef
    IL_003a: ldloc.s 5
    IL_003c: stloc.s 5
    IL_003e: ldloc.0
    IL_003f: ldfld class EquipPopup EquipPopup/'<>c__CompilerGenerated79'::'<80:<>THIS>'
    IL_0044: ldloc.s 5
    IL_0046: stfld valuetype [UnityEngine]UnityEngine.Color EquipPopup::colorPow
    IL_004b: ldloc.0
    IL_004c: ldfld class EquipPopup EquipPopup/'<>c__CompilerGenerated79'::'<80:<>THIS>'
    IL_0051: ldloc.s 5
    IL_0053: stfld valuetype [UnityEngine]UnityEngine.Color EquipPopup::colorAcc
    IL_0058: ldloc.0

    // snip

And the compiler generated class that it references looks like this:

.class nested private auto ansi beforefieldinit '<>c__CompilerGenerated79'
    extends [mscorlib]System.Object
{
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
        01 00 00 00
    )
    // Fields
    .field assembly class EquipPopup '<80:<>THIS>'

    // Methods
    .method public hidebysig specialname rtspecialname 
        instance void .ctor (
            class EquipPopup parent
        ) cil managed 
    {
        // Method begins at RVA 0xefec4
        // Code size 19 (0x13)
        .maxstack 8

        IL_0000: ldarg.0
        IL_0001: call instance void [mscorlib]System.Object::.ctor()
        IL_0006: ldarg.0
        IL_0007: ldarg parent
        IL_000b: nop
        IL_000c: nop
        IL_000d: stfld class EquipPopup EquipPopup/'<>c__CompilerGenerated79'::'<80:<>THIS>'
        IL_0012: ret
    } // end of method '<>c__CompilerGenerated79'::.ctor

    .method assembly hidebysig 
        instance bool '<ValueColor>c__401'<T> (
            class InventoryManagerScript/PlayerInventoryItem PII
        ) cil managed 
    {
        .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
            01 00 00 00
        )
        // Method begins at RVA 0xefed8
        // Code size 44 (0x2c)
        .maxstack 3
        .locals init (
            [0] bool
        )

        IL_0000: ldarg.1
        IL_0001: ldfld int32 InventoryManagerScript/PlayerInventoryItem::iSlotType
        IL_0006: brtrue IL_0028

        IL_000b: ldarg.1
        IL_000c: ldfld int32 InventoryManagerScript/PlayerInventoryItem::SlotID
        IL_0011: ldarg.0
        IL_0012: ldfld class EquipPopup EquipPopup/'<>c__CompilerGenerated79'::'<80:<>THIS>'
        IL_0017: ldfld class InventoryManagerScript/PlayerInventoryItem BagicPopup::mClickItem
        IL_001c: ldfld valuetype InventoryManagerScript/sItemType InventoryManagerScript/PlayerInventoryItem::ItemType
        IL_0021: ceq
        IL_0023: br IL_0029

        IL_0028: ldc.i4.0

        IL_0029: ret

        IL_002a: ldloc.0
        IL_002b: ret
    } // end of method '<>c__CompilerGenerated79'::'<ValueColor>c__401'

} // end of class <>c__CompilerGenerated79

I've attached the full C# and IL shown in the snippets and have no problem providing the full assembly if you're willing to look more closely at it, but it's fairly large (1.63MB worth of compiled code). Let me know if you want us to try and write a minimized example that Mono 1.2.5 compiles into minimal code that reproduces the issue.

Conclusion

We're hoping you could give us some advice on how to proceed.

  • Is this an instance of something that an existing Transform is supposed to handle, but fails due to the older compiler emitting different opcodes?
  • or do we need to write our own Transform?
  • In either case, do you have any tips on writing the instruction pattern matching logic such that it doesn't risk being too broad and interfering with decompilation of unrelated parts of the assembly, while at the same time hopefully not being too tedious to write?
  • Less importantly, do you think our changes to YieldReturnDecomplier look okay? It's a bit hacky and there's a few places where we left the pattern matching a bit broad, but it hasn't caused any issues as far as we can tell.
  • Do you think there's a chance that this decompilation issue might be tainted by our earlier changes, in general?
  • Do you have any advice that might help with what we're doing here in general (trying to fix decompilation errors from older Mono versions)?

dongresource avatar Dec 31 '21 00:12 dongresource

Update: We've deduced that it's related to the class being used in an invocation of List<T>.Find() with a lambda. The snippets above stop just before that code occurs, but the attachment contains the whole thing.

We've now written a minimized example.

Input code

using System;
using System.Collections;
using System.Collections.Generic;

public class Test {
	public static List<int> stuff;
	public static void Main() {
		int x = stuff.Find(a => a == 3);
	}
}

Erroneous output

// Test
using System.Collections.Generic;

public class Test
{
        public static List<int> stuff;

        public static void Main()
        {
                _003C_003Ec__CompilerGenerated0 @object = new _003C_003Ec__CompilerGenerated0();
                stuff.Find(@object._003CMain_003Ec__1<int>);
        }
}

IL

.class public auto ansi beforefieldinit Test
        extends [mscorlib]System.Object
{
        // Nested Types
        .class nested private auto ansi beforefieldinit '<>c__CompilerGenerated0'
                extends [mscorlib]System.Object
        {
                .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
                        01 00 00 00
                )
                // Methods
                .method public hidebysig specialname rtspecialname 
                        instance void .ctor () cil managed 
                {
                        // Method begins at RVA 0x2128
                        // Code size 7 (0x7)
                        .maxstack 8

                        IL_0000: ldarg.0
                        IL_0001: call instance void [mscorlib]System.Object::.ctor()
                        IL_0006: ret
                } // end of method '<>c__CompilerGenerated0'::.ctor

                .method assembly hidebysig 
                        instance bool '<Main>c__1'<T> (
                                int32 a
                        ) cil managed 
                {
                        .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
                                01 00 00 00
                        )
                        // Method begins at RVA 0x2130
                        // Code size 7 (0x7)
                        .maxstack 3
                        .locals init (
                                [0] bool
                        )

                        IL_0000: ldarg.1
                        IL_0001: ldc.i4.3
                        IL_0002: ceq
                        IL_0004: ret

                        IL_0005: ldloc.0
                        IL_0006: ret
                } // end of method '<>c__CompilerGenerated0'::'<Main>c__1'

        } // end of class <>c__CompilerGenerated0


        // Fields
        .field public static class [mscorlib]System.Collections.Generic.List`1<int32> stuff

        // Methods
        .method public hidebysig specialname rtspecialname 
                instance void .ctor () cil managed 
        {
                // Method begins at RVA 0x20ec
                // Code size 7 (0x7)
                .maxstack 8

                IL_0000: ldarg.0
                IL_0001: call instance void [mscorlib]System.Object::.ctor()
                IL_0006: ret
        } // end of method Test::.ctor

        .method public hidebysig static 
                void Main () cil managed 
        {
                // Method begins at RVA 0x20f4
                // Code size 38 (0x26)
                .maxstack 5
                .entrypoint
                .locals init (
                        [0] class Test/'<>c__CompilerGenerated0',
                        [1] int32
                )

                IL_0000: nop
                IL_0001: ldc.i4 1
                IL_0006: pop
                IL_0007: nop
                IL_0008: newobj instance void Test/'<>c__CompilerGenerated0'::.ctor()
                IL_000d: stloc.0
                IL_000e: ldsfld class [mscorlib]System.Collections.Generic.List`1<int32> Test::stuff
                IL_0013: ldloc.0
                IL_0014: ldftn instance bool Test/'<>c__CompilerGenerated0'::'<Main>c__1'<int32>(int32)
                IL_001a: newobj instance void class [mscorlib]System.Predicate`1<int32>::.ctor(object, native int)
                IL_001f: callvirt instance !0 class [mscorlib]System.Collections.Generic.List`1<int32>::Find(class [mscorlib]System.Predicate`1<!0>)
                IL_0024: stloc.1
                IL_0025: ret
        } // end of method Test::Main

} // end of class Test

This output is from ILSpy version 7.1.0.6523-preview1, as that's the one I had lying around.

Note that the executable needs to be compiled with the gmcs command (included as part of that version of Mono), as base mcs apparently didn't support generics and lambdas at the time. Here's the compiled executable, for convenience.

Note also that this minimized executable has the lambda/delegate manipulate only primitive types, unlike our first example. The compiler-generated object in our first example has a member with a reference to the file's main class (which it takes as an argument to its constructor). A number of accesses to the main class's members then happens through the compiler-generated object, which was our main issue.

I assume fixing this will require adding a new Transform that:

  • Identifies a method of a compiler-generated object being passed to a method using the ldftn instruction
  • Rewrites that statement into a lamda comprised of the contents of the compiler-generated object's method
  • Finds the newobj that created it, takes note of its arguments and deletes the relevant instructions
  • Searches for any other interactions with its arguments and rewrites them to use the main object directly

dongresource avatar Jan 02 '22 00:01 dongresource

We've now identified that this is supposed to be handled by ICSharpCode.Decompiler/IL/Transforms/DelegateConstruction.cs.

It seems to work correctly with non-generic delegates in programs compiled by Mono 1.2.5. But when the delegate is generic (like Predicate<T>), the singular MethodTypeArgument in GenericContextFromTypeArguments() fails to conform to the ITypeParameter interface, causing the method to return null.

Removing the else clause in that loop makes the decompile of the delegate complete successfully, but it understandably doesn't clean up the other indirect accesses of the class members that go through the compiler-generated object. It makes sense that GenericContext needs that context to clean them up. We did notice however that that singular type argument does satisfy ITypeDefinition.

Any ideas?

dongresource avatar Jan 02 '22 04:01 dongresource

but it understandably doesn't clean up the other indirect accesses of the class members that go through the compiler-generated object

The compiler-generated classes are handled by TransformDisplayClassUsage. Sorry, I currently don't have the time to dig deeper and take a look at all the things you have written above and I am not sure, whether I get to it before February.

May I suggest you to check out the ILSpy-tests submodule to your fork? Feel free to add test configurations for mcs 1.2.5 to our tests, similar to how I added mcs 5 in

  • https://github.com/icsharpcode/ILSpy-tests/commit/6f8860e420b54bdfd726ec3c58a4d178416f9156
  • https://github.com/icsharpcode/ILSpy/commit/7b3940a8182a39abafb71d62cd4217c214e7b12b

Thanks!

siegfriedpammer avatar Jan 02 '22 11:01 siegfriedpammer

We understand that you don't target versions this old so we aren't treating this as a bug report, but more as a request for comment

I don't remember explicitly stating that we don't want to support old versions, for example the decompiler supports C# "switch on string" pattern down to C# 1.1. From my point of view it depends on customer demand and also whether the issues are hard to fix. Note that we have introduced several decompilation options that deal with old compiler behavior.

The requests we have to reject are extended deobfuscation support (anything that's beyond "decompiler should not crash"). The only old compiler behavior (I remember) we also didn't implement, because it is very very niche and turns out to be very hard to fix is: https://github.com/icsharpcode/ILSpy/issues/1159#issuecomment-394976610

Thank you very much!

In general, I would be very interested in receiving these as standard bug reports, you could add a proposed fix.

siegfriedpammer avatar Jan 02 '22 11:01 siegfriedpammer

I don't remember explicitly stating that we don't want to support old versions

Ah. It occurred to me while I was writing that that I couldn't find anything that says you don't target older versions. I had just assumed that was the case.

We've successfully solved the decompilation issues on our end. The resulting code can be recompiled without any trouble and demonstrates behavior identical to the original assemblies when run.

We hope to find some time later to clean up our changes so we can submit pull requests for them. Our current branch supports Mono 1.2.5 targets exclusively; bypassing general control flow in a few places. We'll fix that up and run the testing suite to make sure we haven't introduced any regressions, and we'll add specific tests for Mono 1.2.5 executables.

Should we submit that as a single pull request, or split it into separate ones for changes to YieldReturnDecompiler, TransformDisplayClassUsage, etc?

Thanks for the help!

dongresource avatar Jan 05 '22 14:01 dongresource

I found a strange decompiling bug in a yield instruction: yield return new WaitWhile(Func<bool>(null, (nint)(&Program.get_IsOpen))); After a fast search in the .Net documentation, and the code context, I figured out as: yield return new WaitWhile(() => !Program.IsOpen);

In this case, if the "null" are "this", just replace my correction for () => !this.IsOpen)

There´s another similar cases, ever with same pattern, just changing the Property name.

Another bug I found is "var" case:

<> __0033__anon_8932894852<int,string> anon; anon = from p in Dictionary<string,string> where ...

Here the correct code is "var":

var anon = from p in Dictionary<string,string> where...

for the rest, I not having suitable problems to use the decompiled code. (Some cleaning needed sometimes)

fernandodarci avatar Mar 26 '22 17:03 fernandodarci

This looks like delegate construction was not detected properly. Would you be able to provide us with the assembly?

siegfriedpammer avatar Mar 26 '22 18:03 siegfriedpammer

Sure: Assembly-CSharp.zip

A third bug I found is called "return" case: yield return new WaitUntil(new Func<bool>(this, (nint)__ldftn(ActionScene.get_MiniMapAndCameraActive)));

The correct is yield return new WaitUntil(() => !MiniMapAndCameraActive);

This assembly is from the game "Koitatsu Party", Unity Pro 5.6.0f

You will find the yield bugs in methods signed with unsafe

fernandodarci avatar Mar 26 '22 18:03 fernandodarci