mill `bspBuildTargetScalaMainClasses` and `bspBuildTargetScalaTestClasses` depend on `compile` rather than `semanticDbDataDetailed`

If the BSP client asks for main/test classes, we trigger a regular compilation, even though we already have a finished compilation with the semantic db data. This wastes resources.

We should make it depend on the semantic db data task, as that's the main thing when running in BSP mode.

Aug 26 '25 07:08 arturaz

This is only relevant for Metals, which requires semanticDb, but IntelliJ IDEA does not. Our BSP implementation should already be able to decide whether compile or semanticDbDataDetailed is needed.

Aug 26 '25 08:08 lefou

Our BSP implementation should already be able to decide whether compile or semanticDbDataDetailed is needed.

I don't think it does? I am on my phone now, but it seems that it just depends on compile task.

Aug 26 '25 08:08 arturaz

I meant, our BSP server implementation is already aware, whether we need semanticDb data or not. We have that info in MillBuildServer.SessionInfo.clientWantsSemanticDb and it should be respected in the compilation request. There may be bugs though.

Aug 26 '25 08:08 lefou

https://github.com/com-lihaoyi/mill/blob/8d204315655e510d2fcd3c48dc431b09be474daa/libs/javalib/src/mill/javalib/bsp/BspJavaModule.scala#L135-L143

invokes

https://github.com/com-lihaoyi/mill/blob/007f75c8ae343a9fbbafd77dcb1779acde82882c/libs/javalib/src/mill/javalib/RunModule.scala#L78-L83

which invokes

https://github.com/com-lihaoyi/mill/blob/ca6e3f9cb070c1e860b8b261bd7559658f70d323/libs/javalib/src/mill/javalib/JavaModule.scala#L923-L930

which invokes

https://github.com/com-lihaoyi/mill/blob/ca6e3f9cb070c1e860b8b261bd7559658f70d323/libs/javalib/src/mill/javalib/JavaModule.scala#L844-L881

which invokes

https://github.com/com-lihaoyi/mill/blob/ca6e3f9cb070c1e860b8b261bd7559658f70d323/libs/javalib/src/mill/javalib/JavaModule.scala#L975-L980

So it seems like it doesn't compile things with semantic db.

From discussion with @lihaoyi:

the whole BSP/non-BSP things like a bit of a mess, I wonder if instead of having separate tasks for everything we should make everything the BSP server does generate semanticdb

I agree here. I would like to propose an alternative to the current status quo.

First, some assumptions I'm working with:

As I understand, semanticdb is just a regular compilation + a compiler plugin that is attached that extracts information during the compilation process and writes it to files.
That means that anywhere where you would need an output of regular compilation you can use output of semanticdb compilation.
You cannot produce semanticdb from already compiled sources, so even if you have a cached compile, you have to rerun the compilation with semanticdb enabled.
We do not enable semanticdb on everything because it slows down compilation. From my local experiments module X takes ~31s with compile and about ~39s with semanticdb compile, which seems to confirm this assumption.

With this in mind, I propose that:

we remove the separate semanticDb tasks. Having two branches (compile vs semanticDbData) is tricky to get right, as every task in the task graph has to now know for what purpose it's being called (regular vs semanticdb compilation). As we can see from the previous call graph that I've listed, it's quite hard to get right.
we somehow check whether semanticdb needs to be produced in the compile Task, without passing it explicitly. This could perhaps be an environment variable, or some other sort of global flag? Maybe Task context? The point of global value is to avoid rewriting all of the tasks in the graph that lead to compile
if we compile and semanticdb isn't needed AND we have no previous compilation, we just do the compilation.
if we compile and semanticdb isn't needed AND we have previous compilation with semanticdb, we can use the cached compilation.
if we compile when semanticdb is needed AND we have a previous compilation without semanticdb we need to recompile with semanticdb.

There's a question on what to do when out/ is shared between regular mill and BSP mill. Imagine this scenario:

regular mill performs compile without semanticdb
Immediately after bspmill performs compile with semanticdb, essentially throwing all of the work that step 1 did out of the window.

I believe this can happen if you run Metals and in ./mill --watch yourApp.runBackground in the background.

If the order is reversed and bspmill runs first, regular mill would reuse the compilation.

To solve this there needs to be some sort of communication between the bsp mill instance and the regular mill daemon to indicate that BSP is running. I propose out/millBspProcessId file, which would get deleted upon bsp mill exit. Then, regular mill could check for it's existence (and whether PID is still valid) before invoking compile and produce a semanticdb compilation even if regular mill doesn't strictly need the semanticdb data, on a presumption that bsp mill will need it down the line.

@lihaoyi @lefou @alexarchambault thoughts?

Aug 26 '25 14:08 arturaz

I have lots of thought, but need to find time to read your writeup and answer properly.

In the meantime, please have a look at issue #1546, where I outlined three different ways to realize semanticDb support without sacrifying Mill's consistency.

https://github.com/com-lihaoyi/mill/issues/1546

We currently living with option b) which changed drastically since Scala 3 + semanticdb plugin no longer support -Ystop-after:semanticdb-typer option.

Hence, we should discuss option c), which we haven't tried yet.

Aug 26 '25 14:08 lefou

In the meantime, please have a look at issue #1546, where I outlined three different ways to realize semanticDb support without sacrifying Mill's consistency.

Thanks for the write up. It seems that my proposal is similar to your option c).

Aug 26 '25 14:08 arturaz

It would also be interesting, to find out why the Scala 3 with semanticDB toolchain no longer works with the -Ystop-after:semanticdb-typer options, since this was IMHO the best solution.

What exactly was the issue?
Was it related to the fact, that the semanticDB plugin is not built-in to the Scala 3 compiler? Should we ask the compiler folks, if that change is intentional?
Was it related to Zinc? Does is maybe work as expected when we use the Compiler API directly? Since we not really need (nor used) the incremental feature, using the non-Zinc compile might even produce faster results.
Was it related to incremental compilation?

Aug 26 '25 20:08 lefou

It would also be interesting, to find out why the Scala 3 with semanticDB toolchain no longer works with the -Ystop-after:semanticdb-typer options, since this was IMHO the best solution.

Probably. But even if we figure that out couldn't we use this to then generate semanticdb data on demand in c) solution, instead of full recompile?

It seems to me we can implement c) now and then, if things get better, improve it later.

Aug 26 '25 21:08 arturaz

It depends on the details of solution c).

The most forward way would be to modify scalacOptions and scalacPluginMvnDeps (or their mandatory variants) and in turn always just use the compile task in BSP. There are probably some nifty details I'm overlooking right now.

But we still need to keep the semanticDbData task functional, which is needed by downstream users like mill-scalafix plugin.

Aug 26 '25 22:08 lefou

One problem that needs to be solved if we enable semanticdb for normal compile (which is my favorite approach) is publishing. It happened in the past that I released on Sonatype jars with semanticdb files inside. We need to make sure we filter out semanticdb files from jars when publishing.

Oct 03 '25 16:10 lolgab