codeql icon indicating copy to clipboard operation
codeql copied to clipboard

JS: refactor most library models away from AST nodes

Open erik-krogh opened this issue 3 years ago • 10 comments
trafficstars

Initially I started with refactoring the HTTP models to use dataflow nodes.
That was done surprisingly quick, so I continued with more library models.

I've made deprecated aliases where a renaming made sense, and otherwise I just changed the type.

The flowsTo predicates in the HTTP models were confusing when starting to use DataFlow::Node. (I initially made mistakes because of that). So I've deprecated those flowsTo predicates.

Evaluation

An evaluation looks OK. But there are some new results and some lost results.

New results:

All appear to be TPs, and they are all from recognizing destructuring assignments as property reads, which allows us to recognize more sources.
Two of the results appeared from refactoring the HTTP models (the js/path-injection results).
And two other appeared from refactoring the SensitiveExpr class (the js/insufficient-password-hash resutls).

Lost results:

Both lost results are a consequence of a DataFlow::PropWrite having no result for getRhs() when the AST looks like:

obj.prop += value

One appears to be a TP, the other appears to be an FP.

I should do some followup work to investigate if we can have a sensible result for getRhs() in the above example. (After this PR has merged).

erik-krogh avatar Mar 30 '22 10:03 erik-krogh

Not qualified to sign off these changes to the codeql-ml code - will need to wait until @henrymercer is back on 6th April

annarailton avatar Apr 05 '22 10:04 annarailton

I've done a force push to be consistent in how I've deprecated previously abstract classes.
I now in all cases do it like:

deprecated class OldClass extends DataFlow::Node { // <- not abstract
  NewClass node;
  OldClass() {
    this.flow() = node
  }
}

I could also do them as abstract classes and add implementation classes that ensure that if you extend NewClass than those nodes will also be added to the OldClass (and vice versa). But that would require 2 more classes for every class I've deprecated, and that's a lot of deprecated code to keep around.

erik-krogh avatar Apr 05 '22 11:04 erik-krogh

I'm worried about losing .innerHTML += x as a sink, even temporarily. Could you reduce the risk here, by making DomSink specifically add this back, perhaps via a helper predicate in DomPropertyWrite?

:+1: DomPropertyWrite already defined it's own getRhs() predicate, so I just added another case there.


I think I've addressed all your review comments.

erik-krogh avatar Apr 05 '22 12:04 erik-krogh

Thanks. Would you mind running a final evaluation? It's a large change and we should reduce the risk as much as possible.

asgerf avatar Apr 06 '22 06:04 asgerf

Thanks. Would you mind running a final evaluation? It's a large change and we should reduce the risk as much as possible.

A new evaluation shows a small performance regression.
But I haven't been able to nail down anything that's causing the regression, I think it's a combination of many predicates that now have a slightly worse join-order.
The two new results in the evaluation are both from destructuring assignment being seen as property reads.

@asgerf are you OK with merging?

erik-krogh avatar Apr 07 '22 07:04 erik-krogh

I expanded the change-note to include which classes have changed type.
I created it using a one-off QL-for-QL query (I had get both versions into the same DB).

Here is the raw data from the query: https://docs.google.com/spreadsheets/d/1VkOGUofigegJYJ9un-qRczCBiEeNDEcTUsmFim90gRc/edit?usp=sharing

erik-krogh avatar Apr 25 '22 14:04 erik-krogh

I ran the explicit-this patch to fix the alerts reported, and I fixed an instance of ql/acronyms-should-be-pascal-case in one of the new classes.
So now there's no new QL-for-QL warnings from this PR.


@asgerf I mentioned the classes with changed base-types in the change-note.
I excluded deprecated classes and classes from Customizations files.
So you think that works?

erik-krogh avatar Apr 28 '22 11:04 erik-krogh

I did a rebase to fix the merge conflict.

I'm also doing a new evaluation just to see if that looks OK.

erik-krogh avatar Apr 29 '22 14:04 erik-krogh

New evaluation looks similar to the previous evaluation.
There is still a slight performance regression in the same benchmarks, but still no clear cause.
It might just be from there being more rows to process.
I still think we can merge it.


Edit: Did a merge with main to solve a merge conflict.

erik-krogh avatar May 02 '22 19:05 erik-krogh

Merge after Universe release.

calumgrant avatar Aug 08 '22 13:08 calumgrant

~~The JavaScript language tests are broken, but that's unrelated.
I got an internal PR that fixes it (see the backref).~~. Fixed.

erik-krogh avatar Sep 05 '22 18:09 erik-krogh

A new evaluation looks good.

The new results are due to recognizing object destructuring as property reads.

erik-krogh avatar Sep 06 '22 07:09 erik-krogh