ghidra Registers destroyed by call to function returning void but referenced after call appear to retain old contents

Describe the bug The ARM convention is that a function may destroy the contents of r0, r1, r2, and r3. Yet, ghidra seems unaware of this.

In particular, if a __stdcall function is accidentally declared as returning void, then ghidra believes that in the calling code, r0 still contains whatever it did before the call. This can lead to very misleading decompilation!

Not sure what the best thing to do is, but probably something like "in_r0" is used for an argument value that is not declared for the function.

To Reproduce Steps to reproduce the behavior:

Examine code that calls a __stdcall function and uses the return value in subsequent calculations. Change the called function to return void. Now ghidra behaves as if r0 was saved across the function call and still has the value that it had before the call. Nothing indicates that there is any issue with the disassembly.

Expected behavior I would expect some indication that after the function call, if r0..r3 is used, that the value is undefined (r0 being okay if the function returns a value, r0..r1 if it returns a longlong, etc.) What notation is used is not that important, but silently presuming that the value is preserved across the call leads to severe confusion trying to understand the code. If there is a strong case for assuming the registers are unchanged, perhaps that should be another calling style?

Oct 27 '22 21:10 marcushall42

Sorry, forgot to note that this is version 10.1.5 running on linux, but I don't think that that matters..

Oct 27 '22 21:10 marcushall42

Which ARM processor variant are you using in ghidra?

Oct 28 '22 11:10 GhidorahRex

@marcushall42 , Are you able to provide a sample where this can be observed? Since this is an ABI and decompiler related concern, the sample would be needed to verify the appropriate ABI standard. Any details in this regard would be helpful.

Nov 02 '22 15:11 ghidra1

Language is ARM:LE:32:v7 (1.103)

It is readily demonstrable by looking around for a function where the return value is used in following code. If the called function is then set to return 'void', the decompiler will then start using the previous value in r0 before the call..

Here's an example:

                         **************************************************************
                         *                          FUNCTION                          *
                         **************************************************************
                         undefined FUN_4128169c()
                           assume LRset = 0x0
                           assume TMode = 0x1
         undefined         r0:1           <RETURN>
                         FUN_4128169c                                    XREF[3]:     FUN_40c86e74:40c86e96(c), 
                                                                                      410eaeb8(c), 
                                                                                      FUN_413e0288:413e0294(c)  
    4128169c 2d e9 f0 41     push       { r4, r5, r6, r7, r8, lr  }
    412816a0 04 46           mov        r4,r0
    412816a2 01 25           movs       r5,#0x1
    412816a4 00 26           movs       r6,#0x0
    412816a6 71 f1 60 d8     bl         FUN_41bf276a                                     undefined FUN_41bf276a(void)
    412816aa 07 46           mov        r7,r0
    412816ac 4f f4 da 78     mov.w      r8,#0x1b4

The decompiler shows the function as this:

iVar4 = 1; iVar5 = 0; iVar1 = FUN_41bf276a(); do { iVar2 = iVar4;

iVar1 has it's value in r7.. If I change FUN_41bf276a0() so that it's return type isn't "unknown" but instead is "void", the decompiler produces:

iVar4 = 1; iVar5 = 0; iVar1 = param_1; FUN_41bf276a(); do { iVar2 = iVar4;

iVar1 now assumes the value "param_1" because that is r0 on entry to the function.

Nov 02 '22 17:11 marcushall42

I've just noticed this problem on other architectures as well. In particular, if registers are used to pass the first arguments, and a register holds the return value, then the argument for (usually) param_1 is used as the return value.

The problem is that this can be easily overlooked when trying to analyze the calling function. The calling convention (frequently) does not preserve the argument registers, so they really should be marked something like "unaff_a0", similar to if an argument declaration was missing. Then, it would jump out at you that something was missing. As it is, if the supposed-to-be return value is used several lines later, ghidra currently just uses whatever was passed as param_1 instead, and it may take a lot of staring at it to figure out that things aren't making sense because the function was wrongly declared to return void.

Yes, it is my mistake to have declared the function wrong, but ghidra seems to be fighting against me recognizing the problem when it could fairly easily (it seems) be helping me instead.

Apr 02 '24 22:04 marcushall42

This just got me AGAIN. And I'm even nominally looking for it! Ghidra was showing me:

      evt = nwsel_selectEventTimer(7);
      os_schedule_event_checked
                        (evt,nwsel_plmn_base_timer_item_event_handler,&pp10->event,uVar3);
      pp10->event = evt;

And this was the only place that implied that pp10->event was an event timer pointer, while most other places it sort of made more sense that it was an event record pointer.

But, it turned out that os_schedule_event_checked() really doesn't return void, but a few layers down a called function was returning and event pointer instead! So, after changing the return type of os_schedule_event_checked() away from void, I get this code, that now agrees with everything.

      evt = nwsel_selectEventTimer(7);
      event = os_schedule_event_checked
                        (evt,nwsel_plmn_base_timer_item_event_handler,&pp10->event,uVar3);
      pp10->event = event;

Now, looking at the assembly, it's rather obvious that a0 (first argument) was assigned evt before calling os_schedule_event_checked(), and then was stored in pp10->event. (It's the return value as well.) But, it's in the set of registers destroyed across a function call, so even if os_schedule_event_checked() returned void, a0 should be undefined, not the value it had before the call.

I'm considering the merit of a script to check if registers destroyed across a function call are used after a call before being set. Because ghidra doesn't make any noise about it, it's very easy to overlook and can be hard to notice, even if you're aware of the issue.

This is tagged as Processor/ARM because that is what I originally reported it as, but this is currently on a nanomips target. I think it's really a decompiler issue, so perhaps the tag can be updated?

Apr 17 '24 14:04 marcushall42

You should try to determine what is setting void as the return type. If it is unknown it should be returning undefined. The decompiler removes quite a bit of dead code. Setting the return type as void indicates it was known not to be returning anything so any write to a register which is never read would be treated as dead code. If the return type had remained undefined it would have preserved it as a returned value.

Apr 18 '24 18:04 ghidra1

I'm pretty sure I set it somehow. In this case, I went back and deleted and re-created the function and the disassembly showed the return as "undefined", but the decompiler showed it as "void", so if I did a "commit parameters/return" or edited the signature from the decompiler, it could have been declared as void that way.

But, I'm okay with that. My issue is from the caller code. The calling code loads a0 with the first argument to the function, then after calling the function, a0 is stored into memory. Now, even if the function returns void, registers a0-a7 are destroyed across calls (Marked as killedbycall in the .cspec file), yet the decompiler allows the value to persist across the call. I think that the decompiler should mark the registers that aren't saved across a function call similar to registers used for argument passing without any assigned content. Then it would be clearer at the caller that something was wrong. As it is, the decompilation looks clean and the error can be really difficult to locate. Yes, it's my error for declaring the function as void, but ghidra could easily (I think) assist me in discovering this rather than hiding the error from me.

Apr 18 '24 21:04 marcushall42

Provided the signature has not been locked-in, the decompiler should float the return value based upon the calling convention employed. I am uncertain which calling convention and language you are using where a0-a7 is used and what the expected return storage varnode is. If the function is not compliant with the defult/chosen calling convention or an input value is returned without being set by the function, it may be neccessary to lock-in the signature and modify it to reflect the proper return datatype and possibly apply custom storage.

May 02 '24 17:05 ghidra1

Are we talking about the calleing code, or the called code? My problem is with the disassembly of the calling code.

This is with the nanomips language, where a0 is normally the first argument value as well as the first word of the return value (although I originally saw this on ARM). If the code is something like:

a0 = 5
call function
variable = a0

If function is declared to return void, then the decompiler produces output like: function(5) variable = 5

What I would like to see is something like: function(5) variable = extraout_a0

This would tell me that function is really returning some value and I have mistakenly declared it to return void. Once the definition of function is corrected, then I expect to see something like: iVal1 = function(5) variable = iVal1

I have seen a function that the compiler seems to have had inside knowledge about, where a6 would be normally destroyed across a call, and ghidra properly noted this and subsequent use of the value was tagged as extraout_a6, so in some cases, registers destroyed across function calls do get tagged, but it seems that the primary return value doesn't (and maybe all registers don't) when the function returns void.

May 02 '24 18:05 marcushall42