ghidra
ghidra copied to clipboard
"Split out as new variable" support for stack variables
Describe the solution you'd like A clear and concise description of what you want to happen. Right now, this feature only works with registers. (it works well!) When it comes to stack vars, the option isn't even there. It's a very useful feature when it comes to use different names and types for the same var.
This feature is desperately needed
You can see how reassignment works for uniques (those which labeled as HASH
) and registers if you look at disassembly. Like assign HASH:39f5e6c302b:4 = uVar100
. Also Ghidra does this automatically e.g. when the variable goes out of scope. Internally it just sees that previous definition of this varnode is now expired so it uses a new one instead with new high level (decompiler side) type and name. But stack frame on the other side is single and global within function meaning it is shared among code labels, blocks etc so nothing in it can just "expire". You may notice this too if you commit change for any stack local and open stack frame from disassembly. Just notice there is currently only one copy of it supported for the whole function otherwise it wouldn't be trivial at least to maintain all of these copies. Does this take place in other cases other than "spill and reload" idiom used by compilers to temporarily store something on stack that doesn't fit because of register pressure? If it really makes sense then how to edit such dynamic stack frame(s)?
I've found the correct usage for such feature, it's described here https://github.com/NationalSecurityAgency/ghidra/issues/1510. For now it would be better to just copy by parts the decompilation to some separate text file before you change the stack frame and after you remap locals for other part of function. Maybe it's possible to make more tables for stack frame but currently we don't even know how to handle such copies in stack frame editor.
Is it possible to add a Block ID
and Depth Number
when assigning stack locations as types and variables? For example, in an if
/while
/...
block, user declarations are scoped within that block, effectively creating a stack hierarchy.
We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!
Is it possible to add a
Block ID
andDepth Number
when assigning stack locations as types and variables? For example, in anif
/while
/...
block, user declarations are scoped within that block, effectively creating a stack hierarchy.We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!
We don't have such concept as "block id" as we reassign each register or unique by just one address instead. The reason is in the complexity of operating on these data and user input if necessary (one address is easier than bunch of ids and depths). Exactly the same thing would apply to the stack frames if multiple copies are allowed. Also not sure what stack hierarchy would look like. And we don't even need to create anything (i.e. custom blocks) but address based reassigns.
Is it possible to add a
Block ID
andDepth Number
when assigning stack locations as types and variables? For example, in anif
/while
/...
block, user declarations are scoped within that block, effectively creating a stack hierarchy. We also need to be able to create our own block as I've discovered stack reuse in monolithic functions, efectively in the same block!We don't have such concept as "block id" as we reassign each register or unique by just one address instead. The reason is in the complexity of operating on these data and user input if necessary (one address is easier than bunch of ids and depths). Exactly the same thing would apply to the stack frames if multiple copies are allowed. Also not sure what stack hierarchy would look like. And we don't even need to create anything (i.e. custom blocks) but address based reassigns.
Okay, how about using two addresses (I'm assuming here that the address used is the one associated with either the register
or stack
address space). The second would be assocated with the code
address space and it would be the first address in the current decompilation unit where it was used. You could then be able to split out as needed.
I've not delved far into the class ghidra.app.plugin.core.decompile.actions.IsolateVariableAction
, but I see that current stack
based varnodes
seem to have the same mergeGroup
for all instances at a particular code address which means the option Split Out As New Variable
is unavailabe. Don't know if maybe a different test should be use for stack addreses or something more complex!
Will this be placed on anyone's radar soon???
Sorry for the spam, but I was looking into it and I can't believe I have to do some custom union types just to clean up the decompilation, especially when working with custom layered struct
s this messes things up the more information you add.
Any news on this ? @Swyter How do you use your workaround.
I've successfully updated my assembly to change the offset for the instructions in the area I wish to isolate of the form BP + x
(where x
is the value of the current stack offset) to BP + y
(where y
is a new offset, larger than any current stack use) and also modifying the SUB SP,xx
to add the number of additional bytes used to ensure stack analysis still works.
I've successfully updated my assembly to change the offset for the instructions in the area I wish to isolate of the form
BP + x
(wherex
is the value of the current stack offset) toBP + y
(wherey
is a new offset, larger than any current stack use) and also modifying theSUB SP,xx
to add the number of additional bytes used to ensure stack analysis still works.
This only works when the change required fits into the size of x
/y
/xx
(for a byte, you cannot change the value if you need a local variable ('-'ve offset) more than 127 bytes away). Changing the size of the offset isn't an option as this requires a longer instruction!
Hi @caheckman, please don't take this the wrong way, but as this has been around for a while now, is there a window for this to be looked into soon?
@caheckman I'm getting more and more functions I need to decompile that are making use of the same stack reference for different variable types (especially the creation and destruction of what amounts to temporary C++ objects) within C++ code blocks like within if
/for
/while
statements. This is making the implementation of this feature imperative in order to continue the project. Please, can you give me a glimmer of hope??? I appreciate you seem to be alone in these request types, and I would be willing to assist if there were some decent documentation of how the decompiler works! Thanks for you hard work.
Have you tried using unions of structs which in turn represent locals for each frame separately? Like union frames_FUN_xxxxxxxx { struct frame_0; struct frame_1; ... }
and struct frame_n { local_off0_t; local_off1_t; ... }
@Danil6969 thanks for this great suggestion. It works well for different structures using the exact same location, however, and here's the rub, most of the datatypes are overlapping stack locations! To combat that, you end up with deeply nested union
/struct
combo's, a bit impractical!
@caheckman as OO languages allow the creation/destruction of local (hence stack) objects within code blocks (ie within an if
statement or a for
loop or even a for
loops index), this facility would help significantly. Is there a simple mechanism whereby code block structures could be identified and used to house stack copies that only get copied when a user indicates the necessity?
If the current process is defined as:
- A
Function
has aStack
- A
Stack
has a set/list ofStack Addresses
- A
Stack Address
has someAttributes
(datatype, name, etc.)
Could'nt we add another layer of indirection such that it would be:
- A
Function
has aStack
- A
Stack
has a set/list ofStack Addresses
- A
Stack Address
has a map ofAttributes
applying toFunction Code Addresses
The Stack Analyser
would be able to determine the first Function Code Address
each particular Stack Address
was used by prespecified datatypes and once all datatypes have been identified, flow analysis could fill the Function Code Addreses
that use the same datatype. Also, where Classes are defined, their constructors and destructors could identify the first
and last
Function Code Addresses
to which they apply. Finally a User Facility
would allow the analyst to specify the first
and last
Function Code Addresses
for other datatypes, defined within the function, that would trigger the flow analiser to fill in the missing Function Code Addresses
.
Unfortunately I don't have enough detailed knowledge of Ghidra's internals to complete this myself!
This would also assist in solving https://github.com/NationalSecurityAgency/ghidra/issues/5769#issuecomment-1715842117.