codepropertygraph icon indicating copy to clipboard operation
codepropertygraph copied to clipboard

Argument level granularity in data-flow tracking to calls

Open jaiverma opened this issue 4 years ago • 2 comments

I was trying to get data-flow to a specific argument to a function call. For example, considering the following snippet of code:

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>

int main() {
    uint32_t a = 28;
    uint32_t b = 42;
    uint32_t a_n = ntohl(a);
    uint32_t b_n = ntohl(b);

    char *buf;
    uint32_t offset = a_n + 5;

    memcpy(buf + offset, buf, b_n);
}

I want to get the dataflow from calls to ntohl, to the size argument of memcpy. So in the example, I would expect the flow b_n = ntohl(a) -> ... -> memcpy(buf + offset, buf, b_n).

My query is:

def networkToMemcpy() = {
    val source = cpg.call.name("ntoh(s|l|ll)")
    val sink = cpg.call.name("memcpy").argument(3)
    val paths = sink.reachableByFlows(source)
    paths.l.map(
        l => l.elements.map(
            call => (
                call.asInstanceOf[Call].name,
                call.asInstanceOf[Call].code,
                call.location.filename,
                call.location.lineNumber match {
                    case Some(n) => n.toString
                    case None => "n/a"
                }
            )
        )
    )
}

The problem is, apart from the expected flow, I am also getting the flow of identifier a_n -> memcpy(buf + offset) which is the first argument of memcpy.

joern> networkToMemcpy
res100: List[List[(String, String, String, String)]] = List(
  List(
    ("ntohl", "ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
    ("<operator>.assignment", "b_n = ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
    ("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
  ),
  List(
    ("ntohl", "ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
    ("<operator>.assignment", "a_n = ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
    ("<operator>.addition", "a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
    ("<operator>.assignment", "offset = a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
    ("<operator>.addition", "buf + offset", "/mnt/c/wd/tmp/t/a.c", "15"),
    ("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
  )
)

It seems that argument in val sink = cpg.call.name("memcpy").argument(3) doesn't change the result.

Is there currently a way of getting data-flow for just one argument of a call?

jaiverma avatar May 10 '20 18:05 jaiverma

@jaiverma I recently made some changes in the data flow engine. This problem should be addressed. Could you retest this by any chance?

fabsx00 avatar Sep 02 '20 10:09 fabsx00

Hi @fabsx00

I tested this again with Joern v1.1.1, but it seems to give the same result.

def networkToMemcpy() = {
    val source = cpg.call.name("ntoh(s|l|ll)")
    val sink = cpg.call.name("memcpy").argument(3)
    val paths = sink.reachableByFlows(source)
    paths.p
}

This is still returning flow for the first argument of memcpy.

joern> networkToMemcpy
res4: List[String] = List(
  """_______________________________________________________________________
| tracked                       | lineNumber| method| file             |
|======================================================================|
| ntohl(b)                      | 10        | main  | /tmp/c/arg/main.c|
| b_n = ntohl(b)                | 10        | main  | /tmp/c/arg/main.c|
| memcpy(buf + offset, buf, b_n)| 15        | main  | /tmp/c/arg/main.c|
""",
  """_______________________________________________________________________
| tracked                       | lineNumber| method| file             |
|======================================================================|
| ntohl(a)                      | 9         | main  | /tmp/c/arg/main.c|
| a_n = ntohl(a)                | 9         | main  | /tmp/c/arg/main.c|
| a_n + 5                       | 13        | main  | /tmp/c/arg/main.c|
| offset = a_n + 5              | 13        | main  | /tmp/c/arg/main.c|
| buf + offset                  | 15        | main  | /tmp/c/arg/main.c|
| memcpy(buf + offset, buf, b_n)| 15        | main  | /tmp/c/arg/main.c|
"""
)

jaiverma avatar Sep 03 '20 08:09 jaiverma