codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Java: QL Query to Detect Security Sensitive non-CSPRNG usage

Open JLLeitschuh opened this issue 5 years ago • 12 comments

GitHub Security Lab BB Submission

The goal of this query is to detect the use of a PRNG like java.util.Random, org.apache.commons.lang.RandomStringUtils, org.apache.commons.text.RandomStringGenerator, or java.util.concurrent.ThreadLocalRandom in a security sensitive context.

This vulnerability can have up to a CVSSv3 score of 9.8/10 depending upon the use of the insecure data generated.

Design

The design of the query is to find strings that are generated with some sort of insecure randomess component. Then use taint tracking to see if that string ever taints a value that has some sort of security sensitive indicator.

Currently the indicators I'm using are variables/methods that (when lowercase) match a name with the following predicate.

/**
 * A 'Sensitive Name' is something that indicates that the name has some sort of security
 * implication associated with it's usage.
 */
bindingset[name]
private predicate isSensitiveName(string name) {
  name.matches("%pass%") // also matches password
  or
  name.matches("%tok%") // also matches token
  or
  name.matches("%secret%")
  or
  name.matches("%reset%key%") and not name.matches("%value%") // Reduce false positive from 'keyValue' type of methods from maps
  or
  name.matches("%cred%") // also matches credentials
  or
  name.matches("%auth%") // also matches authentication
  or
  name.matches("%sess%id%") // also matches sessionid
}

I've debated adding %accnt%, %account%, and %trusted% as well.

Query Current Status

I'm currently not totally happy with the false-positive rate that this query produces and I'm looking for support/suggestions on how to increase its accuracy.

Additionally, because my 'sink' locations are inherently fuzzy you end up with multiple sink locations for the same source.

Determining whether or not a Random instance is actually a SecureRandom type at runtime is also inherently difficult. Currently, I'm only considering a Random variable 'safe' if at any point it's assigned as a final variable.

For example, these are 'safe' random values:

class Tester {
    final Random random = new SecureRandom();
    final Random random2;

    Tester() {
        this.random2 = new SecureRandom();
    }
}

If there's a better way to guess the type of a Random expression, I'd love to use it.

Known False Positives

Eclipse/Jetty

The way that Jetty sets it's random number generator is inherently 'safe', in probably 99% of cases.

https://github.com/eclipse/jetty.project/blob/69808d3851de31ffffb343115030d7bcf826ed01/jetty-server/src/main/java/org/eclipse/jetty/server/session/DefaultSessionIdManager.java#L367-L384

Query Results: https://lgtm.com/query/1859599740827953942/

Apache/shiro

https://github.com/apache/shiro/blob/cb3a1b3c0dcd8b89cc8d51ee9ede086882921faa/core/src/main/java/org/apache/shiro/session/mgt/eis/RandomSessionIdGenerator.java

Random random type is ambiguous in my query because there's a setRandom method that could be called by the user.

Apache/cloudstack

https://github.com/apache/cloudstack/blob/ac202639a5108d126ecea585c7e793696679dbe4/utils/src/main/java/com/cloud/utils/PasswordGenerator.java#L53-L107

Can't track that the generateLowercaseChar will always be passed a SecureRandom instance.

Elasticsearch/Elasticsearch

https://github.com/elastic/elasticsearch/blob/a5c40b3d828fe6cc5d722ec134305864d31c6515/server/src/main/java/org/elasticsearch/common/RandomBasedUUIDGenerator.java

Can't track that the Random instance passed to getUUIDBytes will always be an instance of SecureRandom.

More... TODO

Test Repository

I've been testing this query against the code I've been uploading to this repository. Ideally, this repository would get contributed as a part of the tests for this PR when it's complete.

See: https://github.com/JLLeitschuh/bad-random-examples

JLLeitschuh avatar Jan 24 '20 21:01 JLLeitschuh

If there's a better way to guess the type of a Random expression, I'd love to use it.

We have the TypeFlow library, which can do a bit in terms of giving improved type bounds for expressions and fields. It's currently mostly used as part of the virtual dispatch resolution, but it should also be a fine stand-alone library. Just import semmle.code.java.dataflow.TypeFlow and try out the predicate exprTypeFlow(Expr e, RefType t, boolean exact) and predicate fieldTypeFlow(Field f, RefType t, boolean exact) predicates. Note that those predicates only have results for expressions and fields for which the library has determined a type bound that's likely to be better than the static type.

aschackmull avatar Jan 28 '20 13:01 aschackmull

There are a couple of similar "insecure randomness" queries for other languages. Prior to merging, we should consider whether to give this query the same ID (java/insecure-randomness), or whether there is a good reason to differentiate this query from those in the other languages.

yo-h avatar Jan 28 '20 16:01 yo-h

@aschackmull The exprTypeFlow was a good tip, thanks.

Got anything for this?

private static final SecureRandom RANDOM_INSTANCE = new SecureRandom();
private String generateString(Random random) {
 byte[] array = new byte[];
 random.nextBytes(array);
 return new String(array);
}

public String generateSecure() { // Thinks this method is tainted
  return generateString(RANDOM_INSTANCE);
}

I'm running into this case still with Elasticsearch: https://github.com/elastic/elasticsearch/blob/a5c40b3d828fe6cc5d722ec134305864d31c6515/server/src/main/java/org/elasticsearch/common/RandomBasedUUIDGenerator.java

JLLeitschuh avatar Jan 29 '20 02:01 JLLeitschuh

Your example code snippet should work with exprTypeFlow. However, a prerequisite for inferring a type bound for the parameter is that generateString is private - otherwise it might be called from someplace else with something that isn't a SecureRandom. Abstracting a bit, the typeflow library is designed for universal flow, i.e. merge points in the flow graph are treated with universal quantification, whereas the general dataflow library essentially uses existential quantification. If you're generating too many FPs, it might be worth it to consider whether a result should be ruled out by the existence of a SecureRandom that reaches the relevant node, instead of trying to universally prove that all Randoms that get to that point are in fact SecureRandom. This would be slightly more heuristic, but might work better if you're seeing too many FPs.

aschackmull avatar Jan 29 '20 13:01 aschackmull

In the ElasticSearch example, the fact that getBase64UUID is public means that we cannot give a tighter bound on the type of the parameter in getUUIDBytes.

aschackmull avatar Jan 29 '20 13:01 aschackmull

If the relevant FP in ElasticSearch is only in getBase64UUIDSecureString via a data flow return from getUUIDBytes then it is theoretically possible to combine the type flow analysis with the subgraph of data flow related to the source-sink pair and get a better type bound that way. I have plans to do this in order to improve virtual dispatch in data flow, but for this use case it would probably need to be exposed in a slightly different way. These sorts of extensions to data flow are, however, quite non-trivial.

aschackmull avatar Jan 29 '20 13:01 aschackmull

An alternative approach is to perhaps make method calls like nextBytes into additional steps instead of sources, and trace the source further back to a constructor call that definitely isn't new SecureRandom, but rather e.g. new Random. That would deal with the ElasticSearch FP, I think.

aschackmull avatar Jan 29 '20 13:01 aschackmull

This is still a WIP, I'll be coming back to it soon.

JLLeitschuh avatar Feb 11 '20 12:02 JLLeitschuh

This is still a WIP, I'll be coming back to it soon.

Hey @JLLeitschuh do you intend to come back to this PR?

xcorail avatar Jul 08 '20 23:07 xcorail

Hey @JLLeitschuh do you intend to come back to this PR?

Very much so. It's been on the back-burner of my mind for the past few months. I probably won't have time/energy to finish it until after August.

JLLeitschuh avatar Jul 08 '20 23:07 JLLeitschuh

I will get back to this eventually

JLLeitschuh avatar Jan 21 '21 13:01 JLLeitschuh

@egregius313 rather old query, but you may find some things valuable, and worth including in the current variant of this query that ended up being merged.

JLLeitschuh avatar Feb 06 '24 23:02 JLLeitschuh