Overlay: Add overlay annotations to Java & shared libraries
This PR adds overlay annotations for Java libraries and shared libraries to support experimentation with Java overlay analysis. Overlay annotations were added automatically using the add-overlay-annotations.py script. The high-level intend is that dataflow should be global and as much as possible below dataflow should be local. To achieve this the script adds top-level overlay[local?] annotations to Java and shared libraries based on a simple heuristic:
- skip library files that end with
Test.qll - skip library files that end with
Query.qllorConfig.qllif they containimplements DataFlow::ConfigSig.
For files selected for annotation, the script also adds overlay[caller] annotations on all public predicates annotated with pragma[inline] to ensure that those predicates will still be inlined across the overlay frontier once overlay compilation is enabled. See the internal Incremental CodeQL docs for additional details.
I recommend reviewing this PR by reviewing the script that generated the annotations.
Some of the annotated Java library files are also used by Python and C# and the annotations have therefore also been added to the Python and C# variants through sync-files.py.
This PR does not enable overlay compilation for Java and therefore currently has no effect on the generated DIL, RA or QLX (as witnessed by the uneventful DCA runs). The only current impact of adding the annotations will be that the compiler will check for overlay annotation errors, but in the absence of errors the overlay annotations currently have no effect on compilation. See the internal Incremental CodeQL docs for additional details.
A CI check will be added in a subsequent PR to enforce usage of the add-overlay-annotations.py script to automatically add overlay annotations to newly added files.
- skip library files that end with
Query.qllorConfig.qllif they containimplements DataFlow::ConfigSig
This is currently a fairly incomplete heuristic (there are other places that invoke global data flow), but I guess this provides a decent enough starting point and that further manual tweaking is to be expected.
This is currently a fairly incomplete heuristic (there are other places that invoke global data flow), but I guess this provides a decent enough starting point and that further manual tweaking is to be expected.
Agreed. I think we should keep the current heuristic for now as we have tested the results extensively. However, if you have concrete ideas for extending the heuristic, I would be happy to try to incorporate them in the future.
@hvitved @tausbn Would you mind reviewing the C# and Python parts of this PR (commit 2)? The PR adds overlay annotations for Java, but since we still have a few files that are sync'ed between languages, this also affects a few C# and Python files. Overlay compilation is still disabled for C# and Python and will remain so for the foreseeable future, so the annotations won't have any affect on compilation for C# or Python, beyond additional error checks. The overlay annotations are documented here and I would also be happy to give a quick intro to the overlay annotations as context.
@hvitved @tausbn Would you mind reviewing the C# and Python parts of this PR (commit 2)?
You can take my approval as approval for C# and Python as well. 🦭
@aschackmull We ended up renaming the previous overlay[caller] annotation to overlay[caller?]. As a result, I've added a third commit that renames all overlay[caller] annotations to overlay[caller?]. Commits 1 and 2 haven't changed since your last review, but have been rebased. The Compile all queries CI check should succeed once the next CLI is released.
Merged in main to resolve a merge conflict with the recently merged shared Guards library. Needed one additional overlay annotation in the shared Guards library.
The failing Code scanning results / CodeQL CI check indicates the PR introduces two new QL4QL alerts classified as errors, but only lists one, concerning naming of a predicate argument that QL4QL incorrectly interprets as a module name. It looks like the rest of the CodeQL results have been truncated due to the size of the PR. Merging despite the failing Code scanning results.