sca-fuzzer Generator Gadgets

This PR implements the concept of gadgets into Revizor's program generator. Here, a gadget refers to a small list of assembly instructions placed in a specific order for a specific reason. I've made this PR description somewhat lengthy to make sure I document my thought process in implementing this feature.

Benefits of Gadgets

Considering Spectre v1, the below gadget (in pseudocode), which features two memory load instructions, would be useful for causing an extra line in the CPU cache to be allocated during speculative execution.

LOAD REG1, [REG2] # REG1 is destination, REG2 is source address
LOAD REG3, [REG1] # REG1's result from previous load is used as the next load's address

Or consider another gadget, potentially useful for increasing the CPU's speculative execution window by creating a chain of data-dependent instructions:

MUL REG1, REG2, REG3
MUL REG4, REG1, REG3 # depends on REG1
MUL REG2, REG4, REG3 # depends on REG4
MUL REG3, REG2, REG1 # depends on REG2
# ...

Because Revizor's program generator randomly places assembly instructions, specific assembly sequences like these won't easily be generated. Gadgets make this possible.

Gadget File

At runtime, the user can specify the path to a gadget file through a config field (gadget_file). This file is formatted in JSON and uses the same syntax for specifying assembly instructions as Revizor's ISA specification parser. Within this file, the user can place interesting assembly gadgets he/she wants to see in Revizor's generated programs. Each gadget has a name and an optional weight (used to tune how often gadgets appear in generated programs). Here's an example of what one gadget might look like:

{
    "name": "double_subtraction",
    "description": "Two consecutive subtraction instructions.",
    "weight": 0.5,
    "instructions":
    [
        {
            "name": "SUB", "category": "general", "control_flow": false,
            "operands": [
                {"dest": true, "src": false, "comment": "", "type_": "REG", "width": 32, "values": ["GPR", "SP"]},
                {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR", "SP"]},
                {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
            ],
            "implicit_operands": [
                {"type_": "FLAGS", "width": 0, "src": false, "dest": false, "values": ["w", "", "", "w", "w", "", "", "", "w"]}
            ]
        },
        {
            "name": "SUB", "category": "general", "control_flow": false,
            "operands": [
                {"dest": true, "src": false, "comment": "", "type_": "REG", "width": 32, "values": ["GPR", "SP"]},
                {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR", "SP"]},
                {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
            ],
            "implicit_operands": [
                {"type_": "FLAGS", "width": 0, "src": false, "dest": false, "values": ["w", "", "", "w", "w", "", "", "", "w"]}
            ]
        }
    ]
}

If the generator chooses to place this particular gadget into a program, two randomized SUB instructions will be placed (consecutively) within a basic block.

(I've created an example gadget file at src/arm64/tests/example_gadgets.json).

Gadget Selection

As the program generator fills basic blocks with instructions, it may choose to instead select a random gadget from the gadget file and place it into the program. This pseudocode describes the basic idea of how it decides:

while (program isn't full yet)
{
    bb = choose_random_basic_block();
    if (random_chance_to_use_gadget())
    {
        gadget = select_random_gadget_that_fits();
        bb.insert(gadget.instructions);
    }
    else
    {
        instruction = generate_random_instruction();
        bb.insert(instruction);
    }
}

The randomness of gadget placement is configurable using a few new config fields:

CONF.avg_gadgets_per_bb dictates, roughly, how many gadgets are selected and placed into each basic block of the test case.
CONF.max_gadgets_per_bb forces a maximum number of gadgets that can be placed into each basic block.

Gadget placement will also not occur if:

A gadget is too big (i.e. won't fit in CONF.program_size)
The generator is in the middle of placing two memory access instructions
A gadget would exceed the average number of memory accesses (CONF.avg_mem_accesses)

Operand Aliasing

I've also given gadgets the ability to specify unique operand aliases, to control how instruction operands (such as registers) are used within the same gadget. Defining an operand alias (let's say it's named REG_A) as a GPR operand allows us to specify REG_A in the operands of the gadget's instructions. When the gadgets is placed into the code, all instances of REG_A will be replaced by the same randomly-chosen register.

See this gadget as an example:

{
    "name": "spectre_double_load",
    "description": "Two consecutive load instructions in a spectre-like style (with the first load result used as the second load's address)..",
    "weight": 0.75,
    "operand_aliases":
    [
        {
            "name": "REG_A",
            "operand": {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
        },
        {
            "name": "REG_B",
            "operand": {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
        },
        {
            "name": "REG_B",
            "operand": {"dest": false, "src": true, "comment": "", "type_": "REG", "width": 32, "values": ["GPR"]}
        }
    ],
    "instructions":
    [
        {
            "name": "LDR",
            "category": "general",
            "control_flow": false,
            "operands": [
                {"dest": true, "src": false, "type_": "REG", "width": 64, "values": ["REG_A"]},
                {"dest": false, "src": true, "type_": "MEM", "width": 64, "values": ["REG_C"]},
                {"dest": false, "src": true, "type_": "IMM", "values": ["[-256-255]"]}
            ],
            "implicit_operands": []
        },
        {
            "name": "LDR",
            "category": "general",
            "control_flow": false,
            "operands": [
                {"dest": true, "src": false, "type_": "REG", "width": 64, "values": ["REG_B"]},
                {"dest": false, "src": true, "type_": "MEM", "width": 64, "values": ["REG_A"]},
                {"dest": false, "src": true, "type_": "IMM", "values": ["[-256-255]"]}
            ],
            "implicit_operands": []
        }
    ]
}

The operand_aliases allow this gadget to use random registers to fill in REG_A, REG_B, and REG_C, but keep their values consistent across all uses in each of the two load instructions.

Other Additions

Literal Operand

To support extra operands that may not fit under the existing types, I've implemented a LiteralOperand, which allows for any arbitrary literal string to be used as an operand for an assembly instruction. Take this EOR instruction specification as an example:

{
    "name": "EOR",
    "category": "general",
    "control_flow": false,
    "operands": [
        {"dest": true, "src": false, "type_": "REG", "width": 32, "values": ["W0"]},
        {"dest": false, "src": true, "type_": "REG", "width": 32, "values": ["W0"]},
        {"dest": false, "src": true, "type_": "REG", "width": 32, "values": ["W0"]},
        {"type_": "LITERAL", "values": ["ROR #9"]}
    ],
    "implicit_operands": [
        {"type_": "FLAGS", "width": 0, "src": false, "dest": false, "values": ["w", "", "", "w", "w", "", "", "", "w"]}
    ]
}

The final operand is a "LITERAL", and contains the value "ROR #9". This type of operand will be placed in the assembly code as-is. In this example, it allows the EOR instruction to have a right-rotate barrel-shift operand:

EOR W0, W0, W0, ROR #9

Jan 26 '23 16:01 cwshugg

Hi @OleksiiOleksenko - have you had the chance to look at this? I had another idea that could be added to this PR; what do you think of this?

Currently I've implemented this gadget system to forbid any control-flow instructions. The motivation here was to prevent the disruption of your test case's DAG; if control-flow instructions were allowed in gadgets, the user could force more basic blocks to be added to the test case that aren't a part of the official DAG.

However, what do you think about allowing small, self-contained, loops to be present? A "loop gadget" might look like this:

MOV X0, #4
LOOP_GADGET_START:
SUBS X0, X0, #1
B.PL LOOP_GADGET_START

Something like this might allow for forced misprediction to occur, without creating infinite loops or doing anything unreasonable. If this isn't worth implementing as a gadget, perhaps implementing loops would be best as a separate feature of the program generator. (I might actually prefer the latter.) What do you think?

Feb 08 '23 15:02 cwshugg

I'm also implementing a way to "alias" operands in a gadget, to allow for the same unique operand to be used in two different places in the same gadget. I'll push this up as soon as it's done, but for now, I'll mark the PR as a draft. More details to come on this particular part of the PR.

Feb 08 '23 21:02 cwshugg

sca-fuzzer sca-fuzzer copied to clipboard

Generator Gadgets

Benefits of Gadgets

Gadget File

Gadget Selection

Operand Aliasing

Other Additions

Literal Operand

sca-fuzzer
sca-fuzzer copied to clipboard