codeql icon indicating copy to clipboard operation
codeql copied to clipboard

[Bug Report] Data Flow Interruption with Function Parameters and Variable Arguments in Python

Open gravingPro opened this issue 1 year ago • 7 comments
trafficstars

I've encountered issues in CodeQL regarding data flow interruption. Here are the details:

1. Function Parameter Passing Interruption

In the code below:

def read_sql(sql):
    spark.sql()  # sink custom

def process(func, args): 
    func(*args) 

sql = request.json['data']  # Source
process(func=read_sql, args=sql)

CodeQL fails to detect that the tainted variable sql is passed into read_sql when using the process function to handle the function call and its argument. This shows an interruption in data flow tracking during function parameter passing and subsequent invocation with variable arguments.

2. *args and **kwargs Interruption

The problem with *args (variable positional arguments) and **kwargs (variable keyword arguments) is that when used in a way that impacts data flow, CodeQL can't track accurately. In the given example, using *args in the process function leads to incorrect recognition of the data flow for sql. This issue extends to similar scenarios involving these constructs.

Moreover, these problems also occur in functions related to multithreading and multiprocessing like threading.Thread, mulitprocess.Process, concurrent.futures.ThreadPoolExecutor, and concurrent.futures.ProcessPoolExecutor.

I hope this description helps in identifying and resolving these problems. Looking forward to a timely fix or further guidance on handling such complex data flow tracking scenarios.

Best regards

gravingPro avatar Oct 14 '24 13:10 gravingPro