joern icon indicating copy to clipboard operation
joern copied to clipboard

Dataflow inconsistency related to reassignment and subcalls

Open jryerb opened this issue 2 years ago • 10 comments

For the following source code:

oid reassignThenFree(char * ptr)
{
    ptr = malloc(0x80);
    free(ptr);
    return;
}

void reassign(char * ptr)
{
    ptr = malloc(0x80);
    return;
}

int case0()
{
    char * data = malloc(0x100);
    free(data);
    free(data);

    return 0;
}

int case1()
{
    char * data = malloc(0x100);
    free(data);
    data = malloc(0x80);
    free(data);

    return 0;
}

int case2()
{
    char * data = malloc(0x100);
    free(data);
    reassignThenFree(data);

    return 0;
}

int case3()
{
    char * data = malloc(0x100);
    free(data);
    reassign(data);
    free(data);

    return 0;
}

The following queries and results are:

joern> importCode("./test") 
...
joern> run.ossdataflow 
...
joern> def sink = cpg.call("free").argument(1) 
defined function sink

joern> def source = cpg.call("free").argument(1) 
defined function source

joern> sink.reachableByFlows(source).filter(path => path.elements.size > 1).p 
res166: List[String] = List(
  """________________________________________________________________________________________________________________
| tracked        | lineNumber| method| file                                                                     |
|===============================================================================================================|
| free(data)     | 45        | case3 | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
| reassign(data) | 46        | case3 | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
| free(data)     | 47        | case3 | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
""",
  """____________________________________________________________________________________________________________
| tracked    | lineNumber| method| file                                                                     |
|===========================================================================================================|
| free(data) | 17        | case0 | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
| free(data) | 18        | case0 | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
""",
  """_________________________________________________________________________________________________________________________________________
| tracked                      | lineNumber| method           | file                                                                     |
|========================================================================================================================================|
| free(data)                   | 36        | case2            | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
| reassignThenFree(data)       | 37        | case2            | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
| reassignThenFree(char * ptr) | 1         | reassignThenFree | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
| ptr = malloc(0x80)           | 3         | reassignThenFree | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
| free(ptr)                    | 4         | reassignThenFree | /Users/jryerby/Documents/Query dev/RES-489/simple/called_func_behavior.c |
"""
)

So case 0 returns a result, while case 1 does not, presumably due to the reassignment. However in case 2, reassignment doesn't "break" the dataflow in the called function and despite the fact that case 2 prove that the dataflow engine is tracking the flow through subcalls, the dataflow originating in case 3 neither shows the reassignment in the subcall nor "breaks" the dataflow as suggested by case 1. I would expect that either the entire dataflow should be returned including reassignment or that the reassignment "breaks" the dataflow in all cases. Is this a use/option that needs to be enabled issue on my end or a true inconsistency?

jryerb avatar Feb 10 '22 22:02 jryerb

Nice example. The flow for case 2 is incorrect.

A reduced reproducer is the following:

void sink(char* p);
void taint1(char* p);
void taint2(char* p);


void outer(char* ptr){
  taint1(ptr);
  inner(ptr);
  return;
}

void inner(char * ptr)
{
    taint2(ptr);
    ptr = malloc(0x80);
    sink(ptr);
    return;
}

You can observe that (correctly!) there exist no flows from taint2 (neither cpg.call.name("taint2").argument not cpg.method.fullName("taint2").parameter.asOutput) to sink (neither cpg.call.sink.argument not cpg.method.fullName("sink").parameter).

Then we observe that (correctly!) there exists no flow from cpg.method.fullName("taint1").parameter.asOutput to sink (in either representation).

However, we get an incorrect flow from cpg.call("taint1").argument to sink (in either representation):

_____________________________________________________________________________________________________________
| tracked            | lineNumber| method| file                                                              |
|============================================================================================================|
| taint1(ptr)        | 8         | outer | /tmp/semanticcpgtest7155610028977944934/Test3553749843102992678.c |
| inner(ptr)         | 9         | outer | /tmp/semanticcpgtest7155610028977944934/Test3553749843102992678.c |
| inner(char * ptr)  | 13        | inner | /tmp/semanticcpgtest7155610028977944934/Test3553749843102992678.c |
| ptr = malloc(0x80) | 16        | inner | /tmp/semanticcpgtest7155610028977944934/Test3553749843102992678.c |
| sink(ptr)          | 17        | inner | /tmp/semanticcpgtest7155610028977944934/Test3553749843102992678.c |
| sink(char* p)      | 2         | sink  | /tmp/semanticcpgtest7155610028977944934/Test3553749843102992678.c |

Now, I'm not very knowledgeable about the open source dataflow engine -- I'm working on the commercial engine, which is a different beast. @fabsx00 is the main author of the open source engine, but he's currently on vacation. @itsacoderepo are you knowledgeable about the OSS engine internals?

This will take a week until fabs is back. Then he will either take this over, or I can at least force him to explain the code to me while I debug this.

bbrehm avatar Feb 11 '22 13:02 bbrehm

Yep, I can reproduce this. Let me take a look.

fabsx00 avatar Feb 17 '22 13:02 fabsx00

I've tracked it down to an overtainting for method parameters in DDG calculation but I'll need some more time tomorrow for this one.

fabsx00 avatar Feb 17 '22 15:02 fabsx00

https://github.com/joernio/joern/pull/1031

fabsx00 avatar Feb 20 '22 10:02 fabsx00

There was a bug in the data flow for functions that redefined their parameters. I fixed that now and I hope this also fixes your problem. If you could give it a try, that would be much appreciated.

fabsx00 avatar Feb 20 '22 10:02 fabsx00

After I upgraded to the latest joern bbrehm's mrp example is now fixed. Curiously however when running my original test cases, including cordoning off case2 and reassignThenFree to their own file to more closely track bbrehm's example, I still get the results I reported in the original issue post. So I'm not sure what's up.

jryerb avatar Feb 22 '22 21:02 jryerb

Ok, then I guess I'll actually have to read your original report :) Putting it onto my stack, should find some time this week for it.

fabsx00 avatar Feb 23 '22 11:02 fabsx00

I understand. No worries. If it makes you feel any better I was surprised that the change fixed bbrehm's mrp and not my cases.

jryerb avatar Feb 23 '22 15:02 jryerb

Is there any update on this one? Thanks.

jryerb avatar Mar 08 '22 22:03 jryerb

I've done some more digging into this issue and what I have discovered is that when the call to taint2(ptr) is removed from the inner function in @bbrehm 's example, the incorrect flow returns. Example:

void sink(char* p);
void taint1(char* p);
void taint2(char* p);

void outer(char* ptr){
  taint1(ptr);
  inner(ptr);
  return;
}

void inner(char * ptr)
{
    //taint2(ptr);
    ptr = malloc(0x80);
    sink(ptr);
    return;
}

Joern result:

joern> def source = cpg.call("taint1").argument 
defined function source

joern> def sink = cpg.call("sink").argument 
defined function sink

joern> sink.reachableByFlows(source).p 
res10: List[String] = List(
  """________________________________________________________________________________________________________
| tracked            | lineNumber| method| file                                                         |
|=======================================================================================================|
| taint1(ptr)        | 7         | outer | /Users/jryerby/Documents/Query dev/RES-489/simple/sls_test.c |
| inner(ptr)         | 8         | outer | /Users/jryerby/Documents/Query dev/RES-489/simple/sls_test.c |
| inner(char * ptr)  | 12        | inner | /Users/jryerby/Documents/Query dev/RES-489/simple/sls_test.c |
| ptr = malloc(0x80) | 15        | inner | /Users/jryerby/Documents/Query dev/RES-489/simple/sls_test.c |
| sink(ptr)          | 16        | inner | /Users/jryerby/Documents/Query dev/RES-489/simple/sls_test.c |
"""
)

I ran additional test cases with accessing the passed in ptr and also a functional call that doesn't use ptr as an argument before the reassignment of pointer and all those cases also returned the incorrect flow.

jryerb avatar Mar 21 '22 20:03 jryerb