codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Wrong global dataflow analyse in C

Open icy17 opened this issue 3 years ago • 2 comments
trafficstars

Hi! I write a small C code, and I want to find if there is a dataflow from malloc to free.And malloc and free in different functions. But I got a wrong result. The result shows that only malloc in test5 has related dataflow, but I think malloc in test4 should in the result...I don't know if my ql code is wrong. C code :

typedef struct{
    int a;
    char* b;
}test;

typedef struct{
    int a;
    test* b;
}test_big;


void test4(test_big* p)
{
    test* a = malloc(200);
    p->b = a;
    a->b = malloc(2);
    return;
}
void test5(test_big* p)
{
    test* a;
    
    a->b = malloc(2);
    p->b = a;
    return;
}
int main(void)
{
    char* a;
    test_big* b;
    test4(b);
    free(b->b->b);
    test5(b);
    free(b->b->b);
}

QL:

class TestConfiguration extends DataFlow::Configuration {
  TestConfiguration() { this = "TestConfiguration" }

  override predicate isSource(DataFlow::Node source) {
    exists(FunctionCall fc |
      fc.getTarget().hasName("malloc")
      and (fc = source.asExpr())      
    )
  }
  override predicate isSink(DataFlow::Node sink) {
    // sink.asExpr()
    exists(FunctionCall fc |
      fc.getTarget().hasName("free")
      and fc.getAnArgument() = sink.asExpr()
    )
  }
}


from FunctionCall malloc, TestConfiguration cfg, DataFlow::Node source
where malloc.getTarget().hasName("malloc")
and source.asExpr() = malloc
and exists(Expr expr | 
  not expr.getEnclosingFunction() = malloc.getEnclosingFunction()
  and cfg.hasFlow(source, DataFlow::exprNode(expr)))

icy17 avatar Sep 26 '22 07:09 icy17

Hi @icy17,

The good news: The code you've written looks fine! The bad news: test4 is just a really difficult example for our dataflow library. The reason is as follows:

Consider the code in test4. There are two calls to malloc in that function.

  • First, the library finds that a call to malloc(200) ends up in p->b. Then, after that assignment, there's an assignment to a->b. The dataflow library concludes that this assignment doesn't give any useful dataflow since we're currently tracking the malloc call that ended up in p->b, and this assignment writes to a->b. However, since p->b and a alias due to the previous line this is in fact relevant for dataflow!

  • And then there's the call to malloc(2) in test4. Unfortunately, since we're not identifying the aliasing relationship between a and p->b we just find that the call to malloc(2) flows to a->b and flow stops there.

In contrast, this piece of code (where I swapped the order of the assignments to a->b and p->b) should work as expected:

void test6(test_big* p)
{
    test* a = malloc(200);
    a->b = malloc(2);
    p->b = a;
    return;
}

since we can follow the flow from malloc(2), to a->b, to p->b, and finally out of the function.

MathiasVP avatar Sep 26 '22 15:09 MathiasVP

@MathiasVP Thank you for your answer! I got it!

icy17 avatar Sep 27 '22 10:09 icy17