OpenBugger icon indicating copy to clipboard operation
OpenBugger copied to clipboard

Refactor to parse code using ast instead of regex

Open mashdragon opened this issue 2 years ago • 1 comments

Python comes with a built-in module for parsing its own code called ast for parsing the abstract syntax tree of Python code.

We should use ast instead of regex for creating logic bugs. First, we parse the code into an AST object. Then, we can modify the AST to reflect the logic bug we wish to create. Finally, we use a module like astor to convert the AST object back into Python source code.

Here is an example that can help solve #5 by accurately locating and selectively removing individual variables from global statements:

import ast
import random

def remove_random_global(tree):
    # Find all global statements and their parents in the AST
    globals_and_parents = [(node, parent) for parent in ast.walk(tree) for node in getattr(parent, 'body', []) if isinstance(node, ast.Global)]
    
    # If there are no global statements, return the original code
    if len(globals_and_parents) == 0:
        return
    
    random_global, parent = random.choice(globals_and_parents)
    if len(random_global.names) > 1:
        # Remove a single variable from the declaration
        random_var = random.choice(random_global.names)
        random_global.names.remove(random_var)
    else:
        # Remove the entire global statement
        parent.body.remove(random_global)

code = '''
a = 0
b = 1
result = 0
def fib_next():
    """ Computes the next Fibonacci number """
    global a, b
    global result
    a_temp = b
    b += a
    a = a_temp
    result = a
'''

tree = ast.parse(code)
remove_random_global(tree)
print(astor.to_source(tree))

Sample result:

>>> print(astor.to_source(tree))
a = 0
b = 1
result = 0


def fib_next():
    """ Computes the next Fibonacci number """
    global a
    global result
    a_temp = b
    b += a
    a = a_temp
    result = a

Notice that global a, b has changed to global a. Running fib_next() will return an UnboundLocalError: local variable 'b' referenced before assignment.

We can use similar techniques to introduce other types of logic bugs into Python scripts.

Furthermore, ast can also tell us if a Python program is formatted correctly. ast.parse will return the precise parsing error if not:

>>> ast.parse("5 = 5")
  File "<unknown>", line 1
SyntaxError: cannot assign to literal

mashdragon avatar Jan 11 '23 01:01 mashdragon

See https://github.com/furlat/OpenBugger/blob/main/notebooks/ast_notebook.ipynb there is some todo at the end :) thanks for the input again

furlat avatar Jan 12 '23 01:01 furlat