codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Kotlin: build extractor with bazel

Open redsun82 opened this issue 1 year ago • 2 comments

Usage overview

Building the extractor can be done via

bazel build //java/kotlin-extractor:codeql-extractor-kotlin-<variant>-<version>

where <variant> is either standalone or embeddable, and <version> is one of the supported versions.

For the moment both variants where tested by replacing them into target/intree/codeql-java and running one relevant integration test.

bazel build //java/kotlin-extractor

will build a default variant:

  • standalone, unless CODEQL_KOTLIN_SINGLE_VERSION_EMBEDDABLE is set to true, in which case it will go for embeddable
  • the version will be taken as the last supported version less than the version of the currently installed kotlinc
    • if CODEQL_KOTLIN_SINGLE_VERSION is set, that will be used instead
    • if kotlinc is not installed, 1.9.20-Beta will be used (the current kotlin toolchain downloaded by rules_kotlin is 1.9.23).

If using the provided kotlinc wrapper (see section below), bazel will be aware of versions selected by it and change the default //java/kotlinc-extractor accordingly.

kotlinc wrapper

A cross-platform kotlinc wrapper is provided in java/kotlinc-extractor/deps/dev. If this path is added to the PATH environment variable, one can call kotlinc and have the wrapper take care of downloading the appropriate Kotlin compiler under the hood. The desired version can be selected with kotlinc --select x.y.z, or left to the current default of 1.9.0.

Selected and installed version data is stored in .gitignored files in the same directory, and can be cleared with kotlinc --clear.

These -- options cannot conflict with normal kotlinc options, which uses single - options instead.

If ripunzip is installed, or in any case on windows from within semmle-code where ripunzip.exe is provided, that will be used, which results in faster kotlin installations.

Interactions with the existing build

For the moment nothing changes on the internal build, which will still call build.py. After we integrate this into the internal build system we can come back here and clean up obsolete files.

Implementation notes

Version variants

rules_kotlin does not seem to really support building with different -language-version settings in the same workspace, so I ended up patching it to allow specifying that at a kt_jvm_library level via kt_kotlinc_options. This allows defining the different extractor versions in the same build file using a standard [...] comprehension.

Standalone / embeddable

Embeddable requires not only changing one dependency, but also some imports in the source code itself. This is achieved by reexporting and patching the code in the @codeql_kotlin_embeddable external repository. The java/kotlin-extractor/BUILD.bazel file is shared with that repository, and behaves slightly differently depending on that (based on repo_name()).

External dependencies

The kotlin dependencies are given as LFS files in java/kotlin-extractor/deps. However normal codeql users won't have those files downloaded, whether they have git lfs installed or not, as fetching those files are excluded by the repo's .lfsconfig. Additionally, those LFS files won't count towards users' quotas on forks, and users not having write permissions on github/codeql won't be able to push new LFS files there (so we can't be "LFS bombed").

Building the extractor will require git lfs to be installed on the system, but will then work and checkout those dependencies lazily on demand within the bazel cache. This means a bazel clean will clear those files out. However if those files are present in the checkout and not just LFS pointers (as happens for example when updating/adding dependencies, or if git config lfs.fetchinclude java/kotlin-extractor/deps/* is specified followed by a git lfs checkout), then bazel will symlink them as they are, without incurring in any LFS download overhead, including after bazel clean.

Updating the dependencies doesn't require any special actions: when added and pushed the files will be uploaded as LFS files, and locally they will be picked up by bazel without any problem.

rules_kotlin patching

We do need to patch rules_kotlin to provide different -language-version values for different bazel targets. After some discussion we decided to to that with a local bazel registry rather than an override, because that will make sharing with the internal semmle-code repo easier.

redsun82 avatar Apr 04 '24 08:04 redsun82

notice that in build.py we pass -J-Xmx2G to kotlinc to allow running our build through codeql. I just tested analysing the bazel build of the kotlin extractor works with the usual recommendations we give to analyse through bazel, without any further options. If I understood correctly, kotlin compilation in bazel actually happens in the same JVM instance where the whole bazel process resides, and it probably has enough resources already.

redsun82 avatar Apr 09 '24 14:04 redsun82

As an experiment, I tried having the rules_kotlin patch as an override in a local bazel registry here. That would help sharing the patch with semmle-code, but as can be seen includes a lot of boilerplate, so I'm not sure it's worth it.

redsun82 avatar Apr 09 '24 16:04 redsun82

I resurrected some threads that imo still need addressing.

I'm also interested in the future of the hand-written build system: Do we plan to delete this as soon as we've moved the internal build off it? Do we plan to keep it around?

My idea would be to remove all unneeded scripts after the internal repo change is merged.

redsun82 avatar Jun 04 '24 13:06 redsun82