[Bug] - missing dataflows with chained field access call on LHS side of assignment.
I was trying to get the flows in specific use cases from Javascript repo
sample 1 - This is the first issue.
function sayHello(req) {
var payload = {}
payload.publisher.id = "someid";
payload.data = req.data;
}
I tried to get the flows from someid LITERAL to payload IDENTIFIER with the following queries.
val src = cpg.literal("\"someid\"").l
val sink = cpg.identifier("payload").l
sink.reachableByFlows(src).p
I didn't get any flows. However when I tried with below sample reducing the number of field access to only one on LHS side on line no 3
Sample 2
function sayHello(req) {
var payload = {}
payload.publisher = "someid";
payload.data = req.data;
}
with the same set of queries as above. I get the flow as below. However, surprisingly I don’t see a payload IDENTIFIER from line no 5 itself.
val res84: List[String] = List(
"""____________________________________________________________________________
| nodeType | tracked | lineNumber| method | file |
|===========================================================================|
| Literal | payload.publisher = "someid" | 5 | sayHello | tmp.js |
| Call | payload.publisher = "someid" | 5 | sayHello | tmp.js |
| Identifier | payload.data = req.data | 6 | sayHello | tmp.js |
"""
)
So I tried with the below 3rd sample by removing the last statement in the code.
Sample 3 - This one is second issue
function sayHello(req) {
var payload = {}
payload.publisher = "someid";
}
In this case, with the same query, I don’t get the flows from someid LITERAL to payload IDENTIFIER. Which I think should exist.
I tried with a similar java sample and I observed similar behaviour there as well.
sample 1.
public class Test {
public void sayHello(req Request) {
Payload payload = new Payload();
payload.publisher.id = "someid";
payload.data = req.data;
}
}
Sample 2
public class Test {
public void sayHello(req Request) {
Payload payload = new Payload();
payload.publisher = "someid";
payload.data = req.data;
}
}
Sample 3
public class Test {
public void sayHello(req Request) {
Payload payload = new Payload();
payload.publisher = "someid";
}
}
For reference, here is a DDG:
Seems like there aren't individual flows going to the identifier bases themselves, I wonder if this wouldn't be a problem if this was 3-address code? This looks pretty tricky, nothing obvious from a bit of initial debugging
@DavidBakerEffendi what is the solution?
cc: @fabsx00
@pandurangpatil It appears to require a re-design/re-visit of io.joern.dataflowengineoss.queryengine.AccessPathUsage, since this seems to be what decides how containers are tainted
I require a bit of context as to what code might be the culprit, and I see @mpollmeier on the Git blame, but Fabs also told me @bbrehm designed some access path code. Do any of you have suggestions or ideas on how to address the problem above as it seems to be language independent in general.
🤔 as much as I'd love to help here, I merely integrated the code from a separate former ghidra2cpg build into joern: https://github.com/joernio/joern/pull/692 This was 2 years ago and I have no idea where it came from :(
io.joern.dataflowengineoss.passes.reachingdef.DdgGenerator.[isUsing|isContainer|isPart] is probably where we'd make this fix, in which case it may not be super difficult. The DdgGenerator uses io.joern.dataflowengineoss.queryengine.AccessPathUsage which is where the connection is from.