mill feat: Add testQuick for fine-grained selective test execution using codesig

Summary

Implements fine-grained selective test execution for Java modules using Mill's existing codesig bytecode callgraph analysis. This addresses issue #4109.

Key Changes

New testQuick task in TestModule.scala that only runs tests affected by code changes since the last successful run
CodeSig worker module (CodeSigWorkerModule.scala) providing isolated classloader-based codesig computation
Worker implementation (CodeSigWorker.scala) invoking CodeSig.compute() to get method-level bytecode signatures
Integration test demonstrating selective test execution with Java module

How testQuick Works

testQuick provides incremental test execution using the codesig callgraph.

First run:

Acts like test. All tests execute, and during this run:

Method-level bytecode signatures are computed via codesig.
These are aggregated into class-level hashes.
For each test class, dependent classes (based on the callgraph) are recorded.
A snapshot of dependency hashes and test outcomes is written to the module's out directory.

Subsequent runs:

testQuick recomputes class-level hashes and compares them to the snapshot.

A test is re-run only if:

Its compiled class changed,
Any dependency class changed,
It previously failed,
It is newly added.

Persistence:

testQuick maintains a per-module JSON snapshot representing the state after the last successful run. This snapshot stores:

Class-level bytecode hashes for all classes on the run/test classpaths
For each test class: dependency classes, their hashes, and pass/fail result

The snapshot is written to Task.dest, participating in Mill's standard clean/isolated semantics. If the snapshot is missing or incompatible, testQuick falls back to a full run and writes a fresh snapshot.

Benefits

Uses existing codesig infrastructure (same as selective execution)
Works at bytecode level - no need for additional analysis tools
Persists state between runs for incremental testing
Falls back to full test run when state is missing
No new caching layers - all persistence uses Mill's existing out/ structure

Files Changed

libs/javalib/api/src/mill/javalib/codesig/CodeSigWorkerApi.scala - Worker API trait
libs/javalib/src/mill/javalib/codesig/CodeSigWorkerModule.scala - External module
libs/javalib/codesig-worker/src/mill/javalib/codesig/CodeSigWorker.scala - Worker impl
libs/javalib/src/mill/javalib/TestModule.scala - Added testQuick task
libs/javalib/src/mill/javalib/JavaModule.scala - Added methodCodeHashSignatures
libs/javalib/package.mill - Added codesig-worker module
website/docs/modules/ROOT/pages/javalib/testing.adoc - Documentation
Integration test files for testQuick functionality

Test Plan

[ ] Run existing Mill test suite
[ ] Run new TestQuickJavaModuleTests integration test
[ ] Manual verification with sample Java project

Closes #4109

Dec 05 '25 15:12 SolariSystems

@SolariSystems can you explain to me how the persistence of the state between runs works?

Dec 06 '25 11:12 lihaoyi

also if you could in general explain how it works and how it is used in the PR description that would be great

Dec 06 '25 11:12 lihaoyi

Here is how persistence works.

testQuick maintains a per-module JSON snapshot that represents the state of the world after the last successful run.

What is stored:

Class-level bytecode hashes for all classes on the run/test classpaths (derived from codesig's method-level signatures).
For each test class:
- The set of dependency classes referenced in the codesig callgraph.
- The class-level hashes of those dependencies at the time of the run.
- The pass/fail result of the test.

This snapshot is written into the module's Mill out directory (Task.dest), so it participates in Mill's standard clean/isolated semantics.

How it is used on subsequent runs:

Current class-level hashes are recomputed.
The previous snapshot is loaded (if available).
A test class is marked "dirty" if:
- Its own class hash changed,
- Any stored dependency hash changed,
- It failed in the previous run,
- It is new and did not exist in the snapshot.
Only dirty tests are executed. Everything else is skipped.

If the snapshot is missing, unreadable, or incompatible, testQuick falls back to a full run and writes a fresh snapshot. This ensures clean recovery without manual intervention.

Dec 06 '25 14:12 SolariSystems

testQuick provides incremental test execution using the codesig callgraph.

First run:

Acts like test. All tests execute, and during this run:

Method-level bytecode signatures are computed via codesig.
These are aggregated into class-level hashes.
For each test class, dependent classes (based on the callgraph) are recorded.
A snapshot of dependency hashes and test outcomes is written to the module's out directory.

Subsequent runs:

testQuick recomputes class-level hashes and compares them to the snapshot.

A test is re-run only if:

Its compiled class changed,
Any dependency class changed,
It previously failed,
It is newly added.

This yields fine-grained selective testing with no new caching layers. All persistence uses Mill's existing out/ structure and invalidates cleanly when the directory is removed.

Dec 06 '25 14:12 SolariSystems

Did you run the tests? They seem to be failing, along with MIMA binary compatibility checks

Dec 07 '25 00:12 lihaoyi

Thank you for flagging this. The mima check was failing because methodCodeHashSignatures was declared as an abstract method in the public TestModule trait—MiMa correctly flags this as a binary-incompatible change since it forces all existing subclasses to implement a new method.

Root cause: Abstract methods in public traits are binary breaking changes.

Fix (commit e2da3ca8363): Provided a concrete default implementation:

def methodCodeHashSignatures: T[Map[String, Int]] = Task { Map.empty[String, Int] }

This preserves backward compatibility—existing TestModule implementations continue to work unchanged, while modules opting into testQuick can override this method to enable fine-grained selective testing.

I should be upfront: I don't have a local Mill development environment set up to run the full test suite myself. The fix is based on understanding MiMa's binary compatibility rules and reviewing similar patterns in the codebase. CI will verify whether this resolves the issue.

Let me know if you'd like any changes to the approach.

Dec 07 '25 01:12 SolariSystems

Turning this to a Draft since it's not quite ready yet, As mentioned in the developer.adoc (https://github.com/com-lihaoyi/mill/blob/main/developer.adoc#continuous-integration--testing), please make sure CI is green on your fork first before setting it as ready to review

Dec 07 '25 01:12 lihaoyi