codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Java: sanitize values which are checked against an allowlist using java.util.List.contains or java.util.Set.contains

Open owen-mc opened this issue 1 year ago • 5 comments
trafficstars

Checking that a value is in a set of compile-time constant values should be a sanitizer, because what was untrusted data has now been checked to be in a known set of values. It is hard in CodeQL to identify when this happens, as there are many ways of checking that a value is in a known set of values. This PR introduces the framework for this sanitizer and implements it for a small number of cases:

  • currently only a call to contains on a java.util.List or java.util.Set which contains only compile-time constant values
    • if the argument to contains is x.toLowerCase() or x.toUpperCase() then the value that we sanitize is x
  • the allowlist (qualifier of contains) must either be
    • an immutable allowlist created using List.of(...), Set.of(...), Collections.unmodifiableSet(Arrays.asList(...)) or Collections.unmodifiableList(Arrays.asList(...)), either created locally or read from a final static field.
    • a locally constructed List or Set (via a constructor or Arrays.asList(...))
      • which may have had compile-time constant elements added using add, addFirst or addLast
      • which does not have any other methods called on it, and which is not an argument to any other methods
      • which is not captured in any lambda (as this might lead to non-compile time constant elements being added to it).

owen-mc avatar Jul 23 '24 14:07 owen-mc

I should run QA to check alert changes before this is merged. I have run DCA but I don't have enough experience with interpreting java DCA results to know if it is indicative of a performance problem.

owen-mc avatar Oct 09 '24 09:10 owen-mc

Here is one problem I've seen while looking through MRVA results (slightly adapted from core/src/test/java/com/alibaba/druid/bvt/sql/odps/issues/Issue4933.java in alibaba/druid). We don't recognise that the following code might add a non-constant string to tables.

List<String> tables = new ArrayList<>();
SQLUtils.acceptTableSource(
        sql,
        DbType.odps,
        e -> tables.add(((SQLExprTableSource)e).getTableName()),
        e -> e instanceof SQLExprTableSource
);
if(tables.contains("src")) {

Update: I've addressed this now.

owen-mc avatar Oct 09 '24 14:10 owen-mc

Alright, I'm going to step back a bit and try to gather a higher-level picture (there are probably more details I could comment on). What we want is to identify expressions that are always sets/lists of known constants. This is a universal flow problem and the solution is generally expected to be a small set. Any use of local data flow is existential rather than universal flow, so that is only really valid in the dual setting, i.e. under a negation, where we identify things that are not constant, but that's generally a huge set, so probably a bad strategy. So I'd recommend reworking this as a positive recursive definition that avoids any use of existential data flow. The base cases are simple and the recursive steps can take a variety of forms including library calls like Arrays.asList and SSA flow.

aschackmull avatar Oct 17 '24 09:10 aschackmull

Thank you for the thorough review, @aschackmull . I have addressed the superficial comments as it is easy to do so. I will now try to rewrite it as you suggest, which will take longer. Do you have any examples of anything written in the format you are suggesting, which I can refer to? I'm not immediately sure how to structure it.

owen-mc avatar Oct 17 '24 13:10 owen-mc

Do you have any examples of anything written in the format you are suggesting, which I can refer to? I'm not immediately sure how to structure it.

The Java TypeFlow library is an example of a universal flow calculation. I was actually thinking that it might be useful to extract the "universal flow" part of that as a new individual shared library, and then use that.

Aside: The "collection might reach update with non-constant" will need existential flow as it is only naturally expressed as a negation in the positive formulation of constants and collections-of-constants.

aschackmull avatar Oct 17 '24 13:10 aschackmull

I've put up a PR with a reimplementation of this in terms of universal flow here: https://github.com/github/codeql/pull/17901. It builds on top of https://github.com/github/codeql/pull/17863. I've compared results to the implementation given in this PR, which has allowed me to iron out a couple of bugs and add useful improvements. This also identified several FPs in this PR.

aschackmull avatar Nov 04 '24 13:11 aschackmull

Superseded by https://github.com/github/codeql/pull/17901

aschackmull avatar Nov 29 '24 07:11 aschackmull