This PR optimizes and reworks the compilation pipeline, with regards to SemanticDB generation.

Pre-PR situation

there were separate compile and semanticDbDetailed tasks.
compile performed the compilation without semantic DB plugin.
semanticDbDetailed performed the compilation with the semantic DB plugin, but did not reuse the compile's output.

Which meant that if any of these happened:

mill cli invoked compile and then semanticDbData.
mill BSP invoked compile and then any other task that required the semantic db, or vice-versa.

The compilation would have been performed twice, wasting CPU cycles and worsening the developer experience.

Post PR situation

Mill now smartly chooses whether compile produces semanticdb data or not. semanticdb is produced if:

compile was directly invoked by a task that needs semanticdb.
there is at least one BSP client that requires semanticdb to be produced.

Implementation details

Introduction of `MILL_BSP_OUTPUT_DIR`

Previously you could use MILL_OUTPUT_DIR environment variable to set both regular and bsp mill's output directory to a certain folder.

Because regular mill now needs to know the location of the BSP folder, having one variable is problematic:

you run MILL_OUTPUT_DIR=out_bsp ./mill --bsp ...
you want to run regular mill, but provide it a changed path for bsp mill.
MILL_OUTPUT_DIR=out_bsp ./mill ... changes the regular mill out folder.

Thus MILL_BSP_OUTPUT_DIR is introduced, which allows you to:

MILL_BSP_OUTPUT_DIR=out_bsp ./mill --bsp ...
MILL_BSP_OUTPUT_DIR=out_bsp ./mill ... # this still uses the regular out/ folder, but knows where bsp mill out/ folder is located

`BuildCtx.bspSemanticDbSessionsFolder`

Folder in the filesystem where Mill's BSP sessions that require semanticdb store an indicator file (name = process PID, contents are irrelevant) to communicate to main Mill daemon and other BSP sessions that there is at least one Mill session that will need the semanticdb.

The reasoning is that if at least one of Mill's clients requests semanticdb, then there is no point in running regular compile without semanticdb, as eventually we will have to rerun it with semanticdb, and thus we should compile with semanticdb upfront to avoid paying the price of compling twice (without semanticdb and then with it).

`CompilationResult.semanticDbFiles`

Because we can't change the return type of compile due to binary compatibility, semanticDbFiles field was added to CompilationResult and compile fills it in if the compilation happened with semanticdb enabled.

Removal of `CompileFor` and related tasks

These are not needed anymore with the single compile task.

Removal of separate `SemanticDbJavaModule.semanticDbDataDetailed` task

It's functionality was merged to compileInternal, which takes a compileSemanticDb parameter.

There also was a lot of code duplication between compile and semanticDbDataDetailed tasks, for both java and scala modules.

Replace the implicit `Task.dest` usage in `ZincWorker` with explicit `compileTo` argument

This makes it clearer what the parameter is used for and allows to reuse the same value in SemanticDbJavaModule.enhanceCompilationResultWithSemanticDb invocation.

Misc changes

Moved JavaModule#resolveRelativeToOut instance method to UnresolvedPath.resolveRelativeToOut.
Improved Server to provide better debugging output if the server cannot be launched.
testScala212Version updated to 2.12.20 because new semanticdb plugin is not provided for the ancient 2.12.6 version that was used.

Fixes: https://github.com/com-lihaoyi/mill/issues/5744

Sep 11 '25 10:09 arturaz

TBH, the current state (of this PR) looks way to complicated and makes too much assumptions for my taste. But I admit, the current codebase (in this context) is already far from being simple.

I'd like to split the problem and provide clear solutions separated from each other.

Issue 1: Bad compilation performance

we currently compile too much, since the semanticDbData task duplicates compilation work already done in the compile task.
by always compiling with the semanticDB generator enabled, we could optimize the compilation for the BSP use case and would also ensure sync'ed results, but we potenially leak unwanted semanticDb data downstream.

Issue 2: Decide when we need semanticDB data

explicit: user enabled semanticDB in the module via scalacOptions - the compile result will contain the semanticDB data files and we should not not apply any extra processing
implicit: user uses Metals as the IDE - the compile task should not contain any semanticDB data files, as these are considered unwanted results (e.g. they should not appear downstream on classpaths or in jars)
no: we don't need semanticDB data at all - we should not generate it

Proposal for Issue 1:

Disclaimer: proposed task names are not final but choosen to make the concept clear

create a new persistent compileWithMaybeSemanticDb task which does the actual compilation, and include semanticDB data if we, for some reasons, need them.
Let the compile task use the result of compileWithMaybeSemanticDb but filter out semanticDB data, iff it was not explicitly requested by the module configuration, e.g. via scalacOptions.
Let the semanticDbData task use the result of compileWithMaybeSemanticDb and filter out any non-semanticDB data.
We keep the current concept of well-separated tasks with well-defined results. All downstream users, esp. the BSP client or mill-scalafix plugin keep as-is, but better performing.

Ideas for Issue 2:

Maybe too simple, but we could always generate semanticDB data in compileWithMaybeSemanticDb and just don't use it downstream if nobody is interested in it. This has an overhead of up to 20 percent in case nobody is going to need it. (Very unlikely, but it may also conflict with other compiler plugins and fail the compilation that otherwise would succeed.)
Smart-decision for semanticDB data need. Either project use of the semanticDbData task or any BSP use should permanently enable it. This must be a bullet-proof design, well-documented and users need a way to disable it (opt-out or opt-in). To detect BSP, we should just use the fact that a Mill-generated .bsp/mill-bsp.json file is present, since this won't require any extra book keeping. Users, who don't want to use BSP can also safely remove that file. We could also write an extra file next to this location, so we can check its age, for example. Once the semanticDbData task is used/planned, we may record that fact under the namespace of a dedicated module-specific persistent task semanticDbDataGenerationWanted. This has the issue that the semanticDbData task is a downstream dependency of compileWithMaybeSemanticDb task and we currently can't know if the initial value is correct (so factually we always need to guess "enabled"). It would be really cool to have some way to detect early what the user is going to run. E.g. if we could expose the current execution plan via the TaskCtx. That way we could have an early persistent task semanticDbDataGenerationWanted that decides based on it's persistent state and the fact whether the semanticDbData task is requested. Then we could conservatively default to disabled, unless there is more evidence.

Oct 26 '25 17:10 lefou

@arturaz I think we should try a bit to simplify this. Some feature-flagging on and off is inevitable, but some of the interactions and dependencies introduced here are a bit obscure and unusual

Because regular mill now needs to know the location of the BSP folder, having one variable is problematic:

Why exactly does regular Mill need to know the location of the BSP folder? This is a bit of a weird interaction/dependency that I'd like to cut if at all possible

The reasoning is that if at least one of Mill's clients requests semanticdb, then there is no point in running regular compile without semanticdb, as eventually we will have to rerun it with semanticdb, and thus we should compile with semanticdb upfront to avoid paying the price of compling twice (without semanticdb and then with it).

Rather than than trying to detect this dynamically, let's just hard-code the needs-semanticdb-or-not decision on process start: Metals would spawn a BSP process needing semanticdb, and IntelliJ would spawn a BSP process not needing it. If IntelliJ re-uses a metals process or vice versa, worst case is there may be some inefficiency (either running compile unnecessarily, or generating semanticdbs unnnecessarily), but I think the simplicity is worth the tradeoff.

Last thing is how was this tested. Manual testing and observing a speedup? Add some integration test to make sure that some BSP workflow doesn't call semanticDbDataDetailed? Whatever it is, it should be mentioned in the PR description

I think those are the three last things before we merge this

Nov 02 '25 09:11 lihaoyi

Move the semanticdb generation logic to `compile` and make sure it's shared where possible

Pre-PR situation

Post PR situation

Implementation details

Introduction of MILL_BSP_OUTPUT_DIR

BuildCtx.bspSemanticDbSessionsFolder

CompilationResult.semanticDbFiles

Removal of CompileFor and related tasks

Removal of separate SemanticDbJavaModule.semanticDbDataDetailed task

Replace the implicit Task.dest usage in ZincWorker with explicit compileTo argument

Misc changes

Issue 1: Bad compilation performance

Issue 2: Decide when we need semanticDB data

Proposal for Issue 1:

Ideas for Issue 2:

Introduction of `MILL_BSP_OUTPUT_DIR`

`BuildCtx.bspSemanticDbSessionsFolder`

`CompilationResult.semanticDbFiles`

Removal of `CompileFor` and related tasks

Removal of separate `SemanticDbJavaModule.semanticDbDataDetailed` task

Replace the implicit `Task.dest` usage in `ZincWorker` with explicit `compileTo` argument