codeql
codeql copied to clipboard
Java: sanitize values which are checked against an allowlist using java.util.List.contains or java.util.Set.contains
Checking that a value is in a set of compile-time constant values should be a sanitizer, because what was untrusted data has now been checked to be in a known set of values. It is hard in CodeQL to identify when this happens, as there are many ways of checking that a value is in a known set of values. This PR introduces the framework for this sanitizer and implements it for a small number of cases:
- currently only a call to
containson ajava.util.Listorjava.util.Setwhich contains only compile-time constant values- if the argument to
containsisx.toLowerCase()orx.toUpperCase()then the value that we sanitize isx
- if the argument to
- the allowlist (qualifier of
contains) must either be- an immutable allowlist created using
List.of(...),Set.of(...),Collections.unmodifiableSet(Arrays.asList(...))orCollections.unmodifiableList(Arrays.asList(...)), either created locally or read from a final static field. - a locally constructed
ListorSet(via a constructor orArrays.asList(...))- which may have had compile-time constant elements added using
add,addFirstoraddLast - which does not have any other methods called on it, and which is not an argument to any other methods
- which is not captured in any lambda (as this might lead to non-compile time constant elements being added to it).
- which may have had compile-time constant elements added using
- an immutable allowlist created using
I should run QA to check alert changes before this is merged. I have run DCA but I don't have enough experience with interpreting java DCA results to know if it is indicative of a performance problem.
Here is one problem I've seen while looking through MRVA results (slightly adapted from core/src/test/java/com/alibaba/druid/bvt/sql/odps/issues/Issue4933.java in alibaba/druid). We don't recognise that the following code might add a non-constant string to tables.
List<String> tables = new ArrayList<>();
SQLUtils.acceptTableSource(
sql,
DbType.odps,
e -> tables.add(((SQLExprTableSource)e).getTableName()),
e -> e instanceof SQLExprTableSource
);
if(tables.contains("src")) {
Update: I've addressed this now.
Alright, I'm going to step back a bit and try to gather a higher-level picture (there are probably more details I could comment on). What we want is to identify expressions that are always sets/lists of known constants. This is a universal flow problem and the solution is generally expected to be a small set. Any use of local data flow is existential rather than universal flow, so that is only really valid in the dual setting, i.e. under a negation, where we identify things that are not constant, but that's generally a huge set, so probably a bad strategy.
So I'd recommend reworking this as a positive recursive definition that avoids any use of existential data flow. The base cases are simple and the recursive steps can take a variety of forms including library calls like Arrays.asList and SSA flow.
Thank you for the thorough review, @aschackmull . I have addressed the superficial comments as it is easy to do so. I will now try to rewrite it as you suggest, which will take longer. Do you have any examples of anything written in the format you are suggesting, which I can refer to? I'm not immediately sure how to structure it.
Do you have any examples of anything written in the format you are suggesting, which I can refer to? I'm not immediately sure how to structure it.
The Java TypeFlow library is an example of a universal flow calculation. I was actually thinking that it might be useful to extract the "universal flow" part of that as a new individual shared library, and then use that.
Aside: The "collection might reach update with non-constant" will need existential flow as it is only naturally expressed as a negation in the positive formulation of constants and collections-of-constants.
I've put up a PR with a reimplementation of this in terms of universal flow here: https://github.com/github/codeql/pull/17901. It builds on top of https://github.com/github/codeql/pull/17863. I've compared results to the implementation given in this PR, which has allowed me to iron out a couple of bugs and add useful improvements. This also identified several FPs in this PR.
Superseded by https://github.com/github/codeql/pull/17901