ghidra "Split out as new variable" support for stack variables

Describe the solution you'd like A clear and concise description of what you want to happen. Right now, this feature only works with registers. (it works well!) When it comes to stack vars, the option isn't even there. It's a very useful feature when it comes to use different names and types for the same var.

Dec 20 '20 18:12 PsychoPast

This feature is desperately needed

Apr 19 '21 18:04 larkwiot

You can see how reassignment works for uniques (those which labeled as HASH) and registers if you look at disassembly. Like assign HASH:39f5e6c302b:4 = uVar100. Also Ghidra does this automatically e.g. when the variable goes out of scope. Internally it just sees that previous definition of this varnode is now expired so it uses a new one instead with new high level (decompiler side) type and name. But stack frame on the other side is single and global within function meaning it is shared among code labels, blocks etc so nothing in it can just "expire". You may notice this too if you commit change for any stack local and open stack frame from disassembly. Just notice there is currently only one copy of it supported for the whole function otherwise it wouldn't be trivial at least to maintain all of these copies. Does this take place in other cases other than "spill and reload" idiom used by compilers to temporarily store something on stack that doesn't fit because of register pressure? If it really makes sense then how to edit such dynamic stack frame(s)?

Sep 02 '21 19:09 Danil6969

I've found the correct usage for such feature, it's described here https://github.com/NationalSecurityAgency/ghidra/issues/1510. For now it would be better to just copy by parts the decompilation to some separate text file before you change the stack frame and after you remap locals for other part of function. Maybe it's possible to make more tables for stack frame but currently we don't even know how to handle such copies in stack frame editor.

Sep 02 '21 20:09 Danil6969

Is it possible to add a Block ID and Depth Number when assigning stack locations as types and variables? For example, in an if/while/... block, user declarations are scoped within that block, effectively creating a stack hierarchy.

We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!

Sep 12 '21 08:09 Wall-AF

Is it possible to add a Block ID and Depth Number when assigning stack locations as types and variables? For example, in an if/while/... block, user declarations are scoped within that block, effectively creating a stack hierarchy.

We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!

We don't have such concept as "block id" as we reassign each register or unique by just one address instead. The reason is in the complexity of operating on these data and user input if necessary (one address is easier than bunch of ids and depths). Exactly the same thing would apply to the stack frames if multiple copies are allowed. Also not sure what stack hierarchy would look like. And we don't even need to create anything (i.e. custom blocks) but address based reassigns.

Sep 13 '21 20:09 Danil6969

Is it possible to add a Block ID and Depth Number when assigning stack locations as types and variables? For example, in an if/while/... block, user declarations are scoped within that block, effectively creating a stack hierarchy. We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!

We don't have such concept as "block id" as we reassign each register or unique by just one address instead. The reason is in the complexity of operating on these data and user input if necessary (one address is easier than bunch of ids and depths). Exactly the same thing would apply to the stack frames if multiple copies are allowed. Also not sure what stack hierarchy would look like. And we don't even need to create anything (i.e. custom blocks) but address based reassigns.

Okay, how about using two addresses (I'm assuming here that the address used is the one associated with either the register or stack address space). The second would be assocated with the code address space and it would be the first address in the current decompilation unit where it was used. You could then be able to split out as needed.

I've not delved far into the class ghidra.app.plugin.core.decompile.actions.IsolateVariableAction, but I see that current stack based varnodes seem to have the same mergeGroup for all instances at a particular code address which means the option Split Out As New Variable is unavailabe. Don't know if maybe a different test should be use for stack addreses or something more complex!

Sep 16 '21 16:09 Wall-AF

Will this be placed on anyone's radar soon???

Sep 05 '22 07:09 Wall-AF

Sorry for the spam, but I was looking into it and I can't believe I have to do some custom union types just to clean up the decompilation, especially when working with custom layered structs this messes things up the more information you add.

Jan 01 '23 02:01 Swyter

Any news on this ? @Swyter How do you use your workaround.

May 03 '23 14:05 bommijn

I've successfully updated my assembly to change the offset for the instructions in the area I wish to isolate of the form BP + x (where x is the value of the current stack offset) to BP + y (where y is a new offset, larger than any current stack use) and also modifying the SUB SP,xx to add the number of additional bytes used to ensure stack analysis still works.

May 03 '23 14:05 Wall-AF

I've successfully updated my assembly to change the offset for the instructions in the area I wish to isolate of the form BP + x (where x is the value of the current stack offset) to BP + y (where y is a new offset, larger than any current stack use) and also modifying the SUB SP,xx to add the number of additional bytes used to ensure stack analysis still works.

This only works when the change required fits into the size of x/y/xx (for a byte, you cannot change the value if you need a local variable ('-'ve offset) more than 127 bytes away). Changing the size of the offset isn't an option as this requires a longer instruction!

Jul 05 '23 14:07 Wall-AF

Hi @caheckman, please don't take this the wrong way, but as this has been around for a while now, is there a window for this to be looked into soon?

Jul 08 '23 12:07 Wall-AF

@caheckman I'm getting more and more functions I need to decompile that are making use of the same stack reference for different variable types (especially the creation and destruction of what amounts to temporary C++ objects) within C++ code blocks like within if/for/while statements. This is making the implementation of this feature imperative in order to continue the project. Please, can you give me a glimmer of hope??? I appreciate you seem to be alone in these request types, and I would be willing to assist if there were some decent documentation of how the decompiler works! Thanks for you hard work.

Jul 11 '23 19:07 Wall-AF

Have you tried using unions of structs which in turn represent locals for each frame separately? Like union frames_FUN_xxxxxxxx { struct frame_0; struct frame_1; ... } and struct frame_n { local_off0_t; local_off1_t; ... }

Jul 11 '23 19:07 Danil6969

@Danil6969 thanks for this great suggestion. It works well for different structures using the exact same location, however, and here's the rub, most of the datatypes are overlapping stack locations! To combat that, you end up with deeply nested union/struct combo's, a bit impractical!

Jul 12 '23 10:07 Wall-AF

@caheckman as OO languages allow the creation/destruction of local (hence stack) objects within code blocks (ie within an if statement or a for loop or even a for loops index), this facility would help significantly. Is there a simple mechanism whereby code block structures could be identified and used to house stack copies that only get copied when a user indicates the necessity?

Sep 03 '23 11:09 Wall-AF

If the current process is defined as:

A Function has a Stack
A Stack has a set/list of Stack Addresses
A Stack Address has some Attributes (datatype, name, etc.)

Could'nt we add another layer of indirection such that it would be:

A Function has a Stack
A Stack has a set/list of Stack Addresses
A Stack Address has a map of Attributes applying to Function Code Addresses

The Stack Analyser would be able to determine the first Function Code Address each particular Stack Address was used by prespecified datatypes and once all datatypes have been identified, flow analysis could fill the Function Code Addreses that use the same datatype. Also, where Classes are defined, their constructors and destructors could identify the first and last Function Code Addresses to which they apply. Finally a User Facility would allow the analyst to specify the first and last Function Code Addresses for other datatypes, defined within the function, that would trigger the flow analiser to fill in the missing Function Code Addresses.

Unfortunately I don't have enough detailed knowledge of Ghidra's internals to complete this myself!

This would also assist in solving https://github.com/NationalSecurityAgency/ghidra/issues/5769#issuecomment-1715842117.

Sep 13 '23 10:09 Wall-AF

ghidra ghidra copied to clipboard

"Split out as new variable" support for stack variables

ghidra
ghidra copied to clipboard