codeql
codeql copied to clipboard
Java: QL Query to Detect Security Sensitive non-CSPRNG usage
GitHub Security Lab BB Submission
The goal of this query is to detect the use of a PRNG like java.util.Random, org.apache.commons.lang.RandomStringUtils, org.apache.commons.text.RandomStringGenerator, or java.util.concurrent.ThreadLocalRandom in a security sensitive context.
This vulnerability can have up to a CVSSv3 score of 9.8/10 depending upon the use of the insecure data generated.
Design
The design of the query is to find strings that are generated with some sort of insecure randomess component. Then use taint tracking to see if that string ever taints a value that has some sort of security sensitive indicator.
Currently the indicators I'm using are variables/methods that (when lowercase) match a name with the following predicate.
/**
* A 'Sensitive Name' is something that indicates that the name has some sort of security
* implication associated with it's usage.
*/
bindingset[name]
private predicate isSensitiveName(string name) {
name.matches("%pass%") // also matches password
or
name.matches("%tok%") // also matches token
or
name.matches("%secret%")
or
name.matches("%reset%key%") and not name.matches("%value%") // Reduce false positive from 'keyValue' type of methods from maps
or
name.matches("%cred%") // also matches credentials
or
name.matches("%auth%") // also matches authentication
or
name.matches("%sess%id%") // also matches sessionid
}
I've debated adding %accnt%, %account%, and %trusted% as well.
Query Current Status
I'm currently not totally happy with the false-positive rate that this query produces and I'm looking for support/suggestions on how to increase its accuracy.
Additionally, because my 'sink' locations are inherently fuzzy you end up with multiple sink locations for the same source.
Determining whether or not a Random instance is actually a SecureRandom type at runtime is also inherently difficult. Currently, I'm only considering a Random variable 'safe' if at any point it's assigned as a final variable.
For example, these are 'safe' random values:
class Tester {
final Random random = new SecureRandom();
final Random random2;
Tester() {
this.random2 = new SecureRandom();
}
}
If there's a better way to guess the type of a Random expression, I'd love to use it.
Known False Positives
Eclipse/Jetty
The way that Jetty sets it's random number generator is inherently 'safe', in probably 99% of cases.
https://github.com/eclipse/jetty.project/blob/69808d3851de31ffffb343115030d7bcf826ed01/jetty-server/src/main/java/org/eclipse/jetty/server/session/DefaultSessionIdManager.java#L367-L384
Query Results: https://lgtm.com/query/1859599740827953942/
Apache/shiro
https://github.com/apache/shiro/blob/cb3a1b3c0dcd8b89cc8d51ee9ede086882921faa/core/src/main/java/org/apache/shiro/session/mgt/eis/RandomSessionIdGenerator.java
Random random type is ambiguous in my query because there's a setRandom method that could be called by the user.
Apache/cloudstack
https://github.com/apache/cloudstack/blob/ac202639a5108d126ecea585c7e793696679dbe4/utils/src/main/java/com/cloud/utils/PasswordGenerator.java#L53-L107
Can't track that the generateLowercaseChar will always be passed a SecureRandom instance.
Elasticsearch/Elasticsearch
https://github.com/elastic/elasticsearch/blob/a5c40b3d828fe6cc5d722ec134305864d31c6515/server/src/main/java/org/elasticsearch/common/RandomBasedUUIDGenerator.java
Can't track that the Random instance passed to getUUIDBytes will always be an instance of SecureRandom.
More... TODO
Test Repository
I've been testing this query against the code I've been uploading to this repository. Ideally, this repository would get contributed as a part of the tests for this PR when it's complete.
See: https://github.com/JLLeitschuh/bad-random-examples
If there's a better way to guess the type of a Random expression, I'd love to use it.
We have the TypeFlow library, which can do a bit in terms of giving improved type bounds for expressions and fields. It's currently mostly used as part of the virtual dispatch resolution, but it should also be a fine stand-alone library. Just import semmle.code.java.dataflow.TypeFlow and try out the predicate exprTypeFlow(Expr e, RefType t, boolean exact) and predicate fieldTypeFlow(Field f, RefType t, boolean exact) predicates. Note that those predicates only have results for expressions and fields for which the library has determined a type bound that's likely to be better than the static type.
There are a couple of similar "insecure randomness" queries for other languages. Prior to merging, we should consider whether to give this query the same ID (java/insecure-randomness), or whether there is a good reason to differentiate this query from those in the other languages.
@aschackmull The exprTypeFlow was a good tip, thanks.
Got anything for this?
private static final SecureRandom RANDOM_INSTANCE = new SecureRandom();
private String generateString(Random random) {
byte[] array = new byte[];
random.nextBytes(array);
return new String(array);
}
public String generateSecure() { // Thinks this method is tainted
return generateString(RANDOM_INSTANCE);
}
I'm running into this case still with Elasticsearch: https://github.com/elastic/elasticsearch/blob/a5c40b3d828fe6cc5d722ec134305864d31c6515/server/src/main/java/org/elasticsearch/common/RandomBasedUUIDGenerator.java
Your example code snippet should work with exprTypeFlow. However, a prerequisite for inferring a type bound for the parameter is that generateString is private - otherwise it might be called from someplace else with something that isn't a SecureRandom.
Abstracting a bit, the typeflow library is designed for universal flow, i.e. merge points in the flow graph are treated with universal quantification, whereas the general dataflow library essentially uses existential quantification.
If you're generating too many FPs, it might be worth it to consider whether a result should be ruled out by the existence of a SecureRandom that reaches the relevant node, instead of trying to universally prove that all Randoms that get to that point are in fact SecureRandom. This would be slightly more heuristic, but might work better if you're seeing too many FPs.
In the ElasticSearch example, the fact that getBase64UUID is public means that we cannot give a tighter bound on the type of the parameter in getUUIDBytes.
If the relevant FP in ElasticSearch is only in getBase64UUIDSecureString via a data flow return from getUUIDBytes then it is theoretically possible to combine the type flow analysis with the subgraph of data flow related to the source-sink pair and get a better type bound that way. I have plans to do this in order to improve virtual dispatch in data flow, but for this use case it would probably need to be exposed in a slightly different way. These sorts of extensions to data flow are, however, quite non-trivial.
An alternative approach is to perhaps make method calls like nextBytes into additional steps instead of sources, and trace the source further back to a constructor call that definitely isn't new SecureRandom, but rather e.g. new Random. That would deal with the ElasticSearch FP, I think.
This is still a WIP, I'll be coming back to it soon.
This is still a WIP, I'll be coming back to it soon.
Hey @JLLeitschuh do you intend to come back to this PR?
Hey @JLLeitschuh do you intend to come back to this PR?
Very much so. It's been on the back-burner of my mind for the past few months. I probably won't have time/energy to finish it until after August.
I will get back to this eventually
@egregius313 rather old query, but you may find some things valuable, and worth including in the current variant of this query that ended up being merged.