joern icon indicating copy to clipboard operation
joern copied to clipboard

[dataflow] missing taint path when two dataflow edges reaches one node

Open d1tto opened this issue 10 months ago • 1 comments

Describe the bug

<?php
$cats = $_POST['xxxxxxxxxx'];
$k = $cats;
$v = $cats;
$k = process("a", "b", $k);
$output = $k . $v;
echo $output;

In this example, there should be two taint paths to the echo call, while joern outputs only one:

_______________________________________________________________________________
| nodeType   | tracked                      | lineNumber| method   | file      |
|==============================================================================|
| Call       | $cats = $_POST["xxxxxxxxxx"] | 4         | <global> | index.php |
| Identifier | $cats = $_POST["xxxxxxxxxx"] | 4         | <global> | index.php |
| Identifier | $k = $cats                   | 5         | <global> | index.php |
| Identifier | $k = $cats                   | 5         | <global> | index.php |
| Identifier | process("a","b",$k)          | 7         | <global> | index.php |
| Call       | process("a","b",$k)          | 7         | <global> | index.php |
| Identifier | $k = process("a","b",$k)     | 7         | <global> | index.php |
| Identifier | $k . $v                      | 8         | <global> | index.php |
| Call       | $k . $v                      | 8         | <global> | index.php |
| Identifier | $output = $k . $v            | 8         | <global> | index.php |
| Identifier | echo $output                 | 9         | <global> | index.php |
| Call       | echo $output                 | 9         | <global> | index.php |

To Reproduce

  1. set $_POST['xxx'] as source node
  2. set echo call as sink node
  3. run reachablebyFlows

Expected behavior

Two taint paths, from $k to $output and $v to $output

Desktop (please complete the following information):

  • Windows
  • latest joern-dataflowengine
  • jdk17

d1tto avatar Apr 18 '24 14:04 d1tto

It seems that the problem is caused by the deduplicateFinal method in Engine class. After I remoce this method, the result is complete.

_________________________________________________________________________
| nodeType   | tracked                      | line| method   | file      |
|========================================================================|
| Call       | $cats = $_POST["categories"] | 4   | <global> | index.php |
| Identifier | $cats = $_POST["categories"] | 4   | <global> | index.php |
| Identifier | $k = $cats                   | 5   | <global> | index.php |
| Identifier | $k = $cats                   | 5   | <global> | index.php |
| Identifier | process("a","b",$k)          | 7   | <global> | index.php |
| Call       | process("a","b",$k)          | 7   | <global> | index.php |
| Identifier | $k = process("a","b",$k)     | 7   | <global> | index.php |
| Identifier | $k . $v                      | 8   | <global> | index.php |
| Call       | $k . $v                      | 8   | <global> | index.php |
| Identifier | $output = $k . $v            | 8   | <global> | index.php |
| Identifier | echo $output                 | 9   | <global> | index.php |
| Call       | echo $output                 | 9   | <global> | index.php |
_________________________________________________________________________
| nodeType   | tracked                      | line| method   | file      |
|========================================================================|
| Call       | $cats = $_POST["categories"] | 4   | <global> | index.php |
| Identifier | $cats = $_POST["categories"] | 4   | <global> | index.php |
| Identifier | $v = $cats                   | 6   | <global> | index.php |
| Identifier | $v = $cats                   | 6   | <global> | index.php |
| Identifier | $k . $v                      | 8   | <global> | index.php |
| Identifier | $k . $v                      | 8   | <global> | index.php |
| Call       | $k . $v                      | 8   | <global> | index.php |
| Identifier | $output = $k . $v            | 8   | <global> | index.php |
| Identifier | echo $output                 | 9   | <global> | index.php |
| Call       | echo $output                 | 9   | <global> | index.php |
_________________________________________________________________________
| nodeType   | tracked                      | line| method   | file      |
|========================================================================|
| Call       | $cats = $_POST["categories"] | 4   | <global> | index.php |
| Identifier | $cats = $_POST["categories"] | 4   | <global> | index.php |
| Identifier | $v = $cats                   | 6   | <global> | index.php |
| Identifier | $v = $cats                   | 6   | <global> | index.php |
| Identifier | $k . $v                      | 8   | <global> | index.php |
| Call       | $k . $v                      | 8   | <global> | index.php |
| Identifier | $output = $k . $v            | 8   | <global> | index.php |
| Identifier | echo $output                 | 9   | <global> | index.php |
| Call       | echo $output                 | 9   | <global> | index.php |

I read the code and found the path of the same source and sink, which takes the longest path if only one. Is it possible to provide a config option that provides the complete result to the user

d1tto avatar Apr 18 '24 15:04 d1tto