codepropertygraph
codepropertygraph copied to clipboard
Argument level granularity in data-flow tracking to calls
I was trying to get data-flow to a specific argument to a function call. For example, considering the following snippet of code:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>
int main() {
uint32_t a = 28;
uint32_t b = 42;
uint32_t a_n = ntohl(a);
uint32_t b_n = ntohl(b);
char *buf;
uint32_t offset = a_n + 5;
memcpy(buf + offset, buf, b_n);
}
I want to get the dataflow from calls to ntohl
, to the size
argument of memcpy
. So in the example, I would expect the flow b_n = ntohl(a) -> ... -> memcpy(buf + offset, buf, b_n)
.
My query is:
def networkToMemcpy() = {
val source = cpg.call.name("ntoh(s|l|ll)")
val sink = cpg.call.name("memcpy").argument(3)
val paths = sink.reachableByFlows(source)
paths.l.map(
l => l.elements.map(
call => (
call.asInstanceOf[Call].name,
call.asInstanceOf[Call].code,
call.location.filename,
call.location.lineNumber match {
case Some(n) => n.toString
case None => "n/a"
}
)
)
)
}
The problem is, apart from the expected flow, I am also getting the flow of identifier a_n -> memcpy(buf + offset)
which is the first argument of memcpy
.
joern> networkToMemcpy
res100: List[List[(String, String, String, String)]] = List(
List(
("ntohl", "ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
("<operator>.assignment", "b_n = ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
),
List(
("ntohl", "ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
("<operator>.assignment", "a_n = ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
("<operator>.addition", "a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
("<operator>.assignment", "offset = a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
("<operator>.addition", "buf + offset", "/mnt/c/wd/tmp/t/a.c", "15"),
("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
)
)
It seems that argument
in val sink = cpg.call.name("memcpy").argument(3)
doesn't change the result.
Is there currently a way of getting data-flow for just one argument of a call?
@jaiverma I recently made some changes in the data flow engine. This problem should be addressed. Could you retest this by any chance?
Hi @fabsx00
I tested this again with Joern v1.1.1, but it seems to give the same result.
def networkToMemcpy() = {
val source = cpg.call.name("ntoh(s|l|ll)")
val sink = cpg.call.name("memcpy").argument(3)
val paths = sink.reachableByFlows(source)
paths.p
}
This is still returning flow for the first argument of memcpy
.
joern> networkToMemcpy
res4: List[String] = List(
"""_______________________________________________________________________
| tracked | lineNumber| method| file |
|======================================================================|
| ntohl(b) | 10 | main | /tmp/c/arg/main.c|
| b_n = ntohl(b) | 10 | main | /tmp/c/arg/main.c|
| memcpy(buf + offset, buf, b_n)| 15 | main | /tmp/c/arg/main.c|
""",
"""_______________________________________________________________________
| tracked | lineNumber| method| file |
|======================================================================|
| ntohl(a) | 9 | main | /tmp/c/arg/main.c|
| a_n = ntohl(a) | 9 | main | /tmp/c/arg/main.c|
| a_n + 5 | 13 | main | /tmp/c/arg/main.c|
| offset = a_n + 5 | 13 | main | /tmp/c/arg/main.c|
| buf + offset | 15 | main | /tmp/c/arg/main.c|
| memcpy(buf + offset, buf, b_n)| 15 | main | /tmp/c/arg/main.c|
"""
)