joern
joern copied to clipboard
[dataflow] missing taint path when two dataflow edges reaches one node
Describe the bug
<?php
$cats = $_POST['xxxxxxxxxx'];
$k = $cats;
$v = $cats;
$k = process("a", "b", $k);
$output = $k . $v;
echo $output;
In this example, there should be two taint paths to the echo
call, while joern
outputs only one:
_______________________________________________________________________________
| nodeType | tracked | lineNumber| method | file |
|==============================================================================|
| Call | $cats = $_POST["xxxxxxxxxx"] | 4 | <global> | index.php |
| Identifier | $cats = $_POST["xxxxxxxxxx"] | 4 | <global> | index.php |
| Identifier | $k = $cats | 5 | <global> | index.php |
| Identifier | $k = $cats | 5 | <global> | index.php |
| Identifier | process("a","b",$k) | 7 | <global> | index.php |
| Call | process("a","b",$k) | 7 | <global> | index.php |
| Identifier | $k = process("a","b",$k) | 7 | <global> | index.php |
| Identifier | $k . $v | 8 | <global> | index.php |
| Call | $k . $v | 8 | <global> | index.php |
| Identifier | $output = $k . $v | 8 | <global> | index.php |
| Identifier | echo $output | 9 | <global> | index.php |
| Call | echo $output | 9 | <global> | index.php |
To Reproduce
- set
$_POST['xxx']
as source node - set
echo
call as sink node - run
reachablebyFlows
Expected behavior
Two taint paths, from $k
to $output
and $v
to $output
Desktop (please complete the following information):
- Windows
- latest joern-dataflowengine
- jdk17
It seems that the problem is caused by the deduplicateFinal
method in Engine
class. After I remoce this method, the result is complete.
_________________________________________________________________________
| nodeType | tracked | line| method | file |
|========================================================================|
| Call | $cats = $_POST["categories"] | 4 | <global> | index.php |
| Identifier | $cats = $_POST["categories"] | 4 | <global> | index.php |
| Identifier | $k = $cats | 5 | <global> | index.php |
| Identifier | $k = $cats | 5 | <global> | index.php |
| Identifier | process("a","b",$k) | 7 | <global> | index.php |
| Call | process("a","b",$k) | 7 | <global> | index.php |
| Identifier | $k = process("a","b",$k) | 7 | <global> | index.php |
| Identifier | $k . $v | 8 | <global> | index.php |
| Call | $k . $v | 8 | <global> | index.php |
| Identifier | $output = $k . $v | 8 | <global> | index.php |
| Identifier | echo $output | 9 | <global> | index.php |
| Call | echo $output | 9 | <global> | index.php |
_________________________________________________________________________
| nodeType | tracked | line| method | file |
|========================================================================|
| Call | $cats = $_POST["categories"] | 4 | <global> | index.php |
| Identifier | $cats = $_POST["categories"] | 4 | <global> | index.php |
| Identifier | $v = $cats | 6 | <global> | index.php |
| Identifier | $v = $cats | 6 | <global> | index.php |
| Identifier | $k . $v | 8 | <global> | index.php |
| Identifier | $k . $v | 8 | <global> | index.php |
| Call | $k . $v | 8 | <global> | index.php |
| Identifier | $output = $k . $v | 8 | <global> | index.php |
| Identifier | echo $output | 9 | <global> | index.php |
| Call | echo $output | 9 | <global> | index.php |
_________________________________________________________________________
| nodeType | tracked | line| method | file |
|========================================================================|
| Call | $cats = $_POST["categories"] | 4 | <global> | index.php |
| Identifier | $cats = $_POST["categories"] | 4 | <global> | index.php |
| Identifier | $v = $cats | 6 | <global> | index.php |
| Identifier | $v = $cats | 6 | <global> | index.php |
| Identifier | $k . $v | 8 | <global> | index.php |
| Call | $k . $v | 8 | <global> | index.php |
| Identifier | $output = $k . $v | 8 | <global> | index.php |
| Identifier | echo $output | 9 | <global> | index.php |
| Call | echo $output | 9 | <global> | index.php |
I read the code and found the path of the same source and sink, which takes the longest path if only one. Is it possible to provide a config option that provides the complete result to the user