codeql
codeql copied to clipboard
Wrong global dataflow analyse in C
Hi! I write a small C code, and I want to find if there is a dataflow from malloc to free.And malloc and free in different functions.
But I got a wrong result. The result shows that only malloc in test5 has related dataflow, but I think malloc in test4 should in the result...I don't know if my ql code is wrong.
C code :
typedef struct{
int a;
char* b;
}test;
typedef struct{
int a;
test* b;
}test_big;
void test4(test_big* p)
{
test* a = malloc(200);
p->b = a;
a->b = malloc(2);
return;
}
void test5(test_big* p)
{
test* a;
a->b = malloc(2);
p->b = a;
return;
}
int main(void)
{
char* a;
test_big* b;
test4(b);
free(b->b->b);
test5(b);
free(b->b->b);
}
QL:
class TestConfiguration extends DataFlow::Configuration {
TestConfiguration() { this = "TestConfiguration" }
override predicate isSource(DataFlow::Node source) {
exists(FunctionCall fc |
fc.getTarget().hasName("malloc")
and (fc = source.asExpr())
)
}
override predicate isSink(DataFlow::Node sink) {
// sink.asExpr()
exists(FunctionCall fc |
fc.getTarget().hasName("free")
and fc.getAnArgument() = sink.asExpr()
)
}
}
from FunctionCall malloc, TestConfiguration cfg, DataFlow::Node source
where malloc.getTarget().hasName("malloc")
and source.asExpr() = malloc
and exists(Expr expr |
not expr.getEnclosingFunction() = malloc.getEnclosingFunction()
and cfg.hasFlow(source, DataFlow::exprNode(expr)))
Hi @icy17,
The good news: The code you've written looks fine!
The bad news: test4 is just a really difficult example for our dataflow library. The reason is as follows:
Consider the code in test4. There are two calls to malloc in that function.
-
First, the library finds that a call to
malloc(200)ends up inp->b. Then, after that assignment, there's an assignment toa->b. The dataflow library concludes that this assignment doesn't give any useful dataflow since we're currently tracking themalloccall that ended up inp->b, and this assignment writes toa->b. However, sincep->bandaalias due to the previous line this is in fact relevant for dataflow! -
And then there's the call to
malloc(2)intest4. Unfortunately, since we're not identifying the aliasing relationship betweenaandp->bwe just find that the call tomalloc(2)flows toa->band flow stops there.
In contrast, this piece of code (where I swapped the order of the assignments to a->b and p->b) should work as expected:
void test6(test_big* p)
{
test* a = malloc(200);
a->b = malloc(2);
p->b = a;
return;
}
since we can follow the flow from malloc(2), to a->b, to p->b, and finally out of the function.
@MathiasVP Thank you for your answer! I got it!