codeql
codeql copied to clipboard
Dataflow: Add support for speculative taint flow.
This adds support for speculative taint flow in the shared taint tracking library.
What is this?
This is a magic button (dial, really) that you can turn to calculate more taint flow in order to identify false negatives. So if you suspect a FN, e.g. if you're failing to find a flow for a CVE or you're facing zero results thinking that we might be missing some models, then try this!
How does it work?
Each language provides a huge candidate set of potential taint steps. The default set that I've implemented is simply any argument to any return value (plus potential side-effects on the this argument, if any) on any call for which we don't yet have an existing model or a call target within the analyzed source.
The shared library will then execute the regular taint flow, but in addition it will allow speculative flow steps drawn from this candidate set up to a specified maximum number of such edges along a given path.
It will then report flow in the usual way, and the chosen speculative edges will be visible in the path explanation with the provenance label "Speculative". (In the VSCode plugin this shows up as "(step) Speculative".)
I want to try it, show me how!
It's easy, just replace the application of the TaintTracking::Global module with TaintTracking::SpeculativeFlow. So if you e.g. have
module MyQueryFlow = TaintTracking::Global<MyQueryConfig>;
then replace that with
int speculationLimit() { result = 10 }
module MyQueryFlow = TaintTracking::SpeculativeFlow<MyQueryConfig, speculationLimit/0>;
The number you choose in the speculationLimit is the limit on the number of speculative steps that can be used in a path. A higher number gives more flow, but worse performance. Expect a performance degradation factor roughly equal to the chosen limit.
Testing so far, and followup work for the individual language teams
I've tested this for Java and C# with a number queries on their respective MRVA top100 with good results and reasonable performance. For the remaining languages, the candidate set of edges might need further tweaking to e.g. exclude things that happen to be calls, but which shouldn't be considered as potential taint steps. For C# I e.g. had to reduce the set to "only" include method and constructor calls, i.e. no operator nor property calls (I believe the latter is already included as read/store steps).
I gather it's not (currently) possible to combine speculative taint flow with an existing flow state in the query?
No, I can happily report that combining the two very much is supported!
Is it possible for any of the changes to affect performance when speculation is not being used?
No.
For Ruby, I've now made some exclusions guided by the consistency check. For review please check if those are reasonable.