dewolf
dewolf copied to clipboard
Handle Case-Merging for ranges in integer size
Proposal
Compilers may try to merge cases as integers in special cases as opposed to range simplification when the ranges are not continuous. Take test2
from test_switch
as an example:
int test2()
{
int month;
/* Input month number from user */
printf("Enter month number(1-12): ");
scanf("%d", &month);
switch(month)
{
/* Group all 31 days cases together */
case 1:
case 3:
case 5:
case 7:
case 8:
case 10:
case 12:
printf("31 days");
break;
/* Group all 30 days cases together */
case 4:
case 6:
case 9:
case 11:
printf("30 days");
break;
/* Remaining case */
case 2:
printf("28/29 days");
break;
default:
printf("Invalid input! Please enter month number between 1-12");
}
return 0;
}
The corresponding MLIL instructions from binaryninja and our decompilation result for it is illustrated in this picture:
The conditions on the lines 12 and 13 exhibit the optimization which has taken place here. Instead of creating a jump table with redundant entries, the compiler merges all case constants into an integer utilizing shifting (line 11).
For example, the first case (31 days) should be reached for the constants 1, 3, 5, 7, 8, 10 and 12. The compiler generates the integer 0x15aa
to check for these cases. In order to match the input number, 1 is shifted by the input integer, and we check if the bit at that position is set in 0x15aa
.
bin(0x15aa) = 1010110101010
Considering the shift by one, each bit set in this integer corresponds to one of the merged cases by its position. The 1 at the second position corresponds to 2-1=1
, while the 1 at the fourth position corresponds to 4-1=3
. This way, we can extract all convoluted cases from the utilized constants.
Approach
Implement a Pipeline Stage for the preprocessing pipeline detecting cases like these to reconstruct the correct switch statement. The 1 shifted by an input variable and the checks of the resulting variable should be the identifying features for these cases.
The result should undo the compiler optimization into consecutive if statements, e.g.
if ((var_0 == 1) || (var_0 == 3) || ...)
Of course, the values checked for should not have overlapping bits, to ci & cj = 0 foreach i, j checked
I took the route of unrolling the bit field inside the Branch.condition
. This is possible, since the bit-shift 0x1 << month
gets forwarded by expression-propagation.
However I ran into the following problems:
How can to create a Condition like (var_0 == 1) || (var_0 == 3) || ...
? Preferably I would do something like Condition(logical_or, list_of_comparisons)
but since we support only binary operations I went with nested logical or...
Z3 does not like logical_or
.
Maybe my approach of crafting a Condition
is totally wrong? Please see the POC
Relevant: https://github.com/fkie-cad/dewolf/compare/main...115-handle-case-merging-for-ranges-in-integer-size#diff-40dcaf175ad0532c557a7f82cfe696d0949b7843616ba3cb88406f022cb6a8e7R44-R52
Exception in thread Thread-8:
Traceback (most recent call last):
File "/usr/lib64/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/user/.tools/binaryninja/plugins/../python/binaryninja/plugin.py", line 928, in run
self.task.run()
File "/home/user/.binaryninja/plugins/dewolf/__init__.py", line 41, in run
decompile(self.bv, self.function)
File "/home/user/.binaryninja/plugins/dewolf/__init__.py", line 24, in decompile
task = decompiler.decompile(function, options)
File "/home/user/.binaryninja/plugins/dewolf/decompile.py", line 55, in decompile
pipeline.run(task)
File "/home/user/.binaryninja/plugins/dewolf/decompiler/pipeline/pipeline.py", line 97, in run
instance.run(task)
File "/home/user/.binaryninja/plugins/dewolf/decompiler/pipeline/controlflowanalysis/restructuring.py", line 42, in run
self.t_cfg = TransitionCFG.generate(task.graph)
File "/home/user/.binaryninja/plugins/dewolf/decompiler/structures/graphs/restructuring_graph/transition_cfg.py", line 157, in generate
transition_cfg._process_node(node, cfg, node_translation)
File "/home/user/.binaryninja/plugins/dewolf/decompiler/structures/graphs/restructuring_graph/transition_cfg.py", line 246, in _process_node
self._process_conditional_node(node, cfg, node_transition)
File "/home/user/.binaryninja/plugins/dewolf/decompiler/structures/graphs/restructuring_graph/transition_cfg.py", line 277, in _process_conditional_node
label = self.condition_handler.add_condition(comparision.condition)
File "/home/user/.binaryninja/plugins/dewolf/decompiler/structures/ast/condition_symbol.py", line 85, in add_condition
z3_condition = PseudoLogicCondition.initialize_from_condition(condition, self._logic_context)
File "/home/user/.binaryninja/plugins/dewolf/decompiler/structures/logic/z3_logic.py", line 270, in initialize_from_condition
z3_condition = Z3Implementation.get_z3_condition_of(condition, context)
File "/home/user/.binaryninja/plugins/dewolf/decompiler/structures/logic/z3_implementations.py", line 232, in get_z3_condition_of
return Z3Implementation.OPERATIONS[condition.operation](operand_1, operand_2)
KeyError: <OperationType.logical_or: 28>
@ebehner Could you take a look at this? :)
I understand the problem and know where it comes from. An object of type Condition
can only have the operations ==
, !=
, <
, <=
, >
, >=
all signed and unsigned. Your constructed condition has a different structure and is therefore unexpected. It took a lot of effort to fulfill this. Violating this property can lead to problems, but changing this condition to something like cond == True
will not have the wanted impact on switch recovery. Here, we would have to update the function _process_conditional_node
in the class TransitionCFG
as well as the function add_condition
in ConditionHandler
.
/cib
Branch issue-115-Handle_Case-Merging_for_ranges_in_integer_size created!