codeql
codeql copied to clipboard
JS: refactor most library models away from AST nodes
Initially I started with refactoring the HTTP models to use dataflow nodes.
That was done surprisingly quick, so I continued with more library models.
I've made deprecated aliases where a renaming made sense, and otherwise I just changed the type.
The flowsTo predicates in the HTTP models were confusing when starting to use DataFlow::Node. (I initially made mistakes because of that). So I've deprecated those flowsTo predicates.
Evaluation
An evaluation looks OK. But there are some new results and some lost results.
New results:
All appear to be TPs, and they are all from recognizing destructuring assignments as property reads, which allows us to recognize more sources.
Two of the results appeared from refactoring the HTTP models (the js/path-injection results).
And two other appeared from refactoring the SensitiveExpr class (the js/insufficient-password-hash resutls).
Lost results:
Both lost results are a consequence of a DataFlow::PropWrite having no result for getRhs() when the AST looks like:
obj.prop += value
One appears to be a TP, the other appears to be an FP.
I should do some followup work to investigate if we can have a sensible result for getRhs() in the above example. (After this PR has merged).
Not qualified to sign off these changes to the codeql-ml code - will need to wait until @henrymercer is back on 6th April
I've done a force push to be consistent in how I've deprecated previously abstract classes.
I now in all cases do it like:
deprecated class OldClass extends DataFlow::Node { // <- not abstract
NewClass node;
OldClass() {
this.flow() = node
}
}
I could also do them as abstract classes and add implementation classes that ensure that if you extend NewClass than those nodes will also be added to the OldClass (and vice versa). But that would require 2 more classes for every class I've deprecated, and that's a lot of deprecated code to keep around.
I'm worried about losing .innerHTML += x as a sink, even temporarily. Could you reduce the risk here, by making DomSink specifically add this back, perhaps via a helper predicate in DomPropertyWrite?
:+1:
DomPropertyWrite already defined it's own getRhs() predicate, so I just added another case there.
I think I've addressed all your review comments.
Thanks. Would you mind running a final evaluation? It's a large change and we should reduce the risk as much as possible.
Thanks. Would you mind running a final evaluation? It's a large change and we should reduce the risk as much as possible.
A new evaluation shows a small performance regression.
But I haven't been able to nail down anything that's causing the regression, I think it's a combination of many predicates that now have a slightly worse join-order.
The two new results in the evaluation are both from destructuring assignment being seen as property reads.
@asgerf are you OK with merging?
I expanded the change-note to include which classes have changed type.
I created it using a one-off QL-for-QL query (I had get both versions into the same DB).
Here is the raw data from the query: https://docs.google.com/spreadsheets/d/1VkOGUofigegJYJ9un-qRczCBiEeNDEcTUsmFim90gRc/edit?usp=sharing
I ran the explicit-this patch to fix the alerts reported, and I fixed an instance of ql/acronyms-should-be-pascal-case in one of the new classes.
So now there's no new QL-for-QL warnings from this PR.
@asgerf I mentioned the classes with changed base-types in the change-note.
I excluded deprecated classes and classes from Customizations files.
So you think that works?
I did a rebase to fix the merge conflict.
I'm also doing a new evaluation just to see if that looks OK.
New evaluation looks similar to the previous evaluation.
There is still a slight performance regression in the same benchmarks, but still no clear cause.
It might just be from there being more rows to process.
I still think we can merge it.
Edit: Did a merge with main to solve a merge conflict.
Merge after Universe release.
~~The JavaScript language tests are broken, but that's unrelated.
I got an internal PR that fixes it (see the backref).~~.
Fixed.
The new results are due to recognizing object destructuring as property reads.