codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Java: Taint flows backwards for array element

Open Marcono1234 opened this issue 1 year ago • 5 comments
trafficstars

Version

CodeQL extension version: 1.12.0 
CodeQL CLI version: 2.15.5 
Platform: win32 x64

Dependencies:

  • codeql/java-all: 0.8.5

Description

It looks like taint tracking erroneously reports that taint flows "backwards" from an array element assignment. For example:

void arrayAssign() {
    String[] s = {"a"};
    s[0] = source(sink(s[0]));
}

Here it erroneously reports that there is taint flow from source to sink, even though sink is actually executed before source, so there should not be any flow between them.

Here is also real world code where this occurs: apache/logging-log4j2 (my query considered the output of String.format as source, and an argument to String.format as sink)

Steps to reproduce

Java code:

class FlowTest {
    <T> T source(T o) {
        return o;
    }

    <T> T sink(T o) {
        return o;
    }

    void varAssign() {
        String s = "a";
        s = source(sink(s));
    }

    void arrayAssign() {
        String[] s = {"a"};
        // Erroneously reports flow from `source` to `sink`
        s[0] = source(sink(s[0]));
    }
}

CodeQL query:

/**
 * @kind problem
 */

import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking

class Source extends DataFlow::Node {
  Source() {
    exists(Method m |
      m = this.asExpr().(MethodCall).getMethod().getSourceDeclaration() and
      m.hasName("source") and
      m.fromSource()
    )
  }
}

class Sink extends DataFlow::Node {
  Sink() {
    exists(MethodCall call, Method m |
      call.getAnArgument() = this.asExpr() and
      m = call.getMethod().getSourceDeclaration() and
      m.hasName("sink") and
      m.fromSource()
    )
  }
}

from Source source, Sink sink, string type
where
  type = "dataflow" and DataFlow::localFlow(source, sink)
  or
  type = "taint" and TaintTracking::localTaint(source, sink)
select sink, type + " from $@", source, "source"

Marcono1234 avatar Jan 15 '24 01:01 Marcono1234

Investigated this a little: the cause is our control-flow graph reflecting the order of events according to the JLS:

If any of the expressions in an array assignment have side-effects, the ordering is:

first()[second()] = third();

However of course once all the necessary arguments are evaluated, the last step is to assign the RHS value to the array cell, so our reading that data will flow from the RHS (source) to the array assignment (store step) to the use of the array on the RHS (apparent subsequent use step) is not a valid grounds for propagating taint in that order.

Note that source consuming data from sink is not important: this example works too:

class FlowTest {
    String source() {
        return "tainted";
    }

    String sink(String arg) {
        return arg;
    }

    void arrayAssign() {
        String[] s = {"a"};
        // Erroneously reports flow from `source` to `sink`
        s[0] = source() + sink(s[0]);
    }
}

Here the flow is source() -> op+ -> s[0] -> s (lhs use) -> s (rhs use, correctly reflecting the order of the JVM's references to s), -> s[0] -> sink

smowton avatar Jan 15 '24 14:01 smowton

Hi @Marcono1234,

Thanks again for your report. I believe @smowton answered your question so I'm proceeding by closing this issue.

If you have any follow-up questions, feel free to re-open this issue.

Thanks!

rvermeulen avatar Oct 15 '24 21:10 rvermeulen

@rvermeulen, if I understood @smowton correctly they mainly acknowledged that this is a bug and explained the reason (implementation using JLS evaluation order instead of taint flow order):

so our reading that data will flow from the RHS (source) to the array assignment (store step) to the use of the array on the RHS (apparent subsequent use step) is not a valid grounds for propagating taint in that order

This is labelled as https://github.com/github/codeql/labels/question because there is no issue template for https://github.com/github/codeql/labels/bug; one of the maintainers has to set it manually it seems.


feel free to re-open this issue

I can't: If someone other than the author closes an issue, then the author cannot reopen it again (unless they are a member of the repository).

Could you please reopen this? This issue still occurs for CodeQL 2.19.1 (codeql/java-all 4.1.0) with the original reproduction steps provided above.

Marcono1234 avatar Oct 15 '24 22:10 Marcono1234

Reopened. You're correct that this is still an issue - it's just not a very high priority, as it's probably not a large source of FP flow, and It's a bit finicky to fix. But it's noted and appreciated, so if I find some spare time for something like this, then I might have a go at fixing it.

aschackmull avatar Oct 16 '24 07:10 aschackmull

@Marcono1234 excuses, I wasn't aware it couldn't be reopened.

Thanks @aschackmull for the follow-up. I will assign the bug label and keep it open even though it might not end up being addressed.

rvermeulen avatar Oct 16 '24 17:10 rvermeulen