mill Overhaul Mill BSP Support

It looks like the Mill BSP subsystem will need to be refactored quite a lot for https://github.com/com-lihaoyi/mill/pull/4864 to accommodate build.mill classpath isolation, so I'm opening this issue to discuss any other changes we would like to make for it. The codebase for mill-bsp has never been in great shape, and stability has never been great, despite multiple efforts to clean it up over the years. This ticket is an opportunity to discuss if there are any fundamental changes we would like to make to it to move things to a better place

Some ideas off the top of my head:

We need a narrower interface between mill-bsp and the individual build.mill files, to support https://github.com/com-lihaoyi/mill/pull/4864, that doesn't involve juggling tasks and evaluators itself.
BSP requests should really go through the standard MillBuildBootstrap logic each time, rather than having its own in-memory Seq[Evaluator] cache and other global mutable state that it has to manage.
The BSP server should be a top-level entrypoint to the Mill process, rather than being generated deep inside MillBuildBootstrap and stored/re-used globally
Do we still need the Bloop code path? Maintaining two separate BSP integrations leads to a lot of confusion, higher maintenance burden, and lower quality for either integration
Do we need to have separate lifecycles for MillBuildServer vs runSession, or can they be combined?

CC @alexarchambault @lefou in case you have any ideas

Apr 06 '25 07:04 lihaoyi

Thanks @lihaoyi for opening this. I experimented a lot to make BSP more modular and also think we need a design change.

There are some requirements for our BSP implementation.

It should be as fast as Mill native CLI
it should cache as much as possible via Mill features
It has to be a single long running process started by the BSP client
requests should be easily testable, on the module as well on the request level

Let me shortly recap our current BSP server implementations.

Initial Design (history)

The initial design mapped the whole BSP server to a large evaluator command. This led to a slow startup without any chance to communicate to the BSP client, resulting in error on the BSP client side. There were no classloader separation or BSP extension support.

Second Approach (current)

Since the BSP server must not block when receiving requests from the BSP client, esp. in the startup phase, we moved the BSP startup logic into mill main, and injected the evaluator(s) as soon as they were available. The BSP server was able to establish a connection to the BSP client rather immediately. But the code is gnarly and hard to read or change. It also does not support BSP extensions dynamically.

Next BSP Design (my ideas)

I think we should split up the entry points for Mill and Mill BSP.

We should have a small leightweight BSP server main, that just sets up communication with the BSP client and keeps a BSP session to handle the lifecycle.

It should itself be a rather leightweight Mill client.

It should handle all BSP protocol requests, translated them into Mill task requests (probably commands) and forward them to our Mill server.

All tasks results are lifted back into BSP responses.

Consequences

No mix-up of different responsibilities in Mill client.
The Mill BSP server just handles the protocol and maps to mill calls.
To be able to cache BSP4j protocol entities (without translating them back and forth), we need to provide a full upickle JSON mapping for BSP classes. Reusing bsp4j classes is unfortunately blocked by issue https://github.com/build-server-protocol/build-server-protocol/discussions/651 and https://github.com/build-server-protocol/build-server-protocol/issues/668 (personal note: BSP received almost no contributions since its move to Bazel). I suspect, we need to prepare our own mappings, either in Mill or upstream (with Smithy).
Due to caching on the module level, BSP Extensions, which are module-specific, should be able to access the extension specific APIs and workers, which is a win
We probably need to be able to dynamically load BSP extensions which are not part of the bsp4j releases in Mill BSP. (No need to implement it right now, but we should not make it an afterthought.)
If Mill BSP needs to cache itself a lot, we could use/generate an internal Mill project and use a dedicated internal out-dir. We could then re-use cached protocol responses in case the Mill project itself has build issues (https://github.com/com-lihaoyi/mill/issues/4868).
(I think this should finally work. It would be great if we could mimic the full BSP protocol on the CLI with commands. To check what's wrong in the IDE, it would then be possible, to e.g. query for all BSP build targets or the env-settings for a specific module.)

Apr 06 '25 08:04 lefou

4. Do we still need the Bloop code path? Maintaining two separate BSP integrations leads to a lot of confusion, higher maintenance burden, and lower quality for either integration

I think we should keep it. The BSP support in Bloop is somehow the reference BSP implementation for Scala projects (it's the one that works best, even for other build tools - it only lacks things like code generation or automatic reloading if the build changes, that is, things that motivate having BSP server in build tools themselves). It's good to have it as a reference, to compare to the BSP support in Mill.

5. Do we need to have separate lifecycles for MillBuildServer vs runSession, or can they be combined?

It seems they follow the same lifecycle currently, but handle two different concerns, that should have different lifecycles in the future IMO. MillBuildServer is created when the build tool starts Mill for BSP. It should live as long as the command that the build tool spawns runs. runSession passes the evaluators to the BSP server. We don't right now, but I think we should update these evaluators every time the meta-build changes. Meaning we should do the "update evaluators" part several times during the life of a single MillBuildServer. I've started working on this here (will try to open PRs for it rather sooner than later, given you've been working on BSP too).

My understanding is that stopping and starting a new MillBuildServer on our own when the meta-build changes isn't really an option: that would drop any queued requests, which would be a problem.

Apr 06 '25 14:04 alexarchambault

Among extra requirements for BSP, I'd add one point discussed around https://github.com/com-lihaoyi/mill/pull/3683#discussion_r1797104302:

the BSP server should watch the meta-build sources, and update the evaluators upon new changes, and notify the build client with onBuildTargetDidChange (so that build clients can reload updated modules, discard deleted ones, etc.)

I'd add to that that it'd be nice if:

the BSP server could start fine even if the meta-build is broken (we should at least get valid modules up to the meta-build level that's broken, so that the IDE can help users fix it)
if possible, during the MillBuildServer lifetime, it should keep former module definitions in the main build or meta-build below a broken meta-build level, so that users can keep working on those even if a meta-build level higher up is broken

I should have the first and second point above working fine in work here. I might have the third one work too, but need to test further what I have.

Apr 06 '25 14:04 alexarchambault

Still thinking about @lefou's points, but I'm wondering in what extend BSP-specific caching is really needed. Mill tasks already rely heavily on caching, BSP should only be a lightweight layer on top of that.

Apr 06 '25 14:04 alexarchambault

the BSP server should watch the meta-build sources, and update the evaluators upon new changes, and notify the build client with onBuildTargetDidChange (so that build clients can reload updated modules, discard deleted ones, etc.)

We'll get this for free if we can unify BSP and non-BSP codepaths, since normal Mill --watch already watches meta build sources and reacts to changes there

Apr 06 '25 15:04 lihaoyi

My understanding is that stopping and starting a new MillBuildServer on our own when the meta-build changes isn't really an option: that would drop any queued requests, which would be a problem.

Yeah my experiments with restarting the BSP server every reload made IntelliJ unhappy, presumably due to the downtime during reloads. But how about changes like "new mill version" that require a process change? Presumably we should have some strategy to handle those

Apr 06 '25 15:04 lihaoyi

the BSP server could start fine even if the meta-build is broken (we should at least get valid modules up to the meta-build level that's broken, so that the IDE can help users fix it)

This should also be straightforward since https://github.com/com-lihaoyi/mill/pull/4873 which moves the BSP server instantiation out of mill.bsp.BSP.startSession into MillMain. The normal MillBuildBootstrap logic already is able to run a partially-broken build provide a RunnerState containing the stack of Evaluators for the levels that succeeded. We just need to make use of those Evaluators appropriately

Apr 06 '25 15:04 lihaoyi

Still thinking about @lefou's points, but I'm wondering in what extend BSP-specific caching is really needed. Mill tasks already rely heavily on caching, BSP should only be a lightweight layer on top of that.

The caching part of the BSP server was meant to be optional ("If Mill BSP need to cache itself a lot"). I too think, the BSP should we a lightweight thing, but my perception for using IJ is different. Even just getting to the list of module sometimes takes much longer than expected.

Apr 06 '25 15:04 lefou

I want to boost, why I think we should make the bsp4j classes (or what they represent) upickle-JSON-serializable.

The BSP protocol mostly defines bulk requests, and we often traverse modules to find or compute the specific answer to the request and multiplex it back into a bulk-response. As an example, the buildTarget/sources just has a list of modules. Our implementation then collects the required info from each module in form of a SourcesItem and creates a SourcesResult. IMHO, it would be a Mill-intrinsic thing to just return the SourcesItem from a task in each module. It may be not that much of an overhead but it is overhead nonetheless to compute these information over and over again. Beside the overhead, cached tasks makes it much easier to inspect sub-results in case something goes wrong or continue with a subset in case something broke (see also https://github.com/com-lihaoyi/mill/discussions/1779).

Apr 06 '25 16:04 lefou

@lefou If you really want to serialize bsp4j classes, isn't it possible to use gson in some upickle readers / writers? Like having upickle give us the raw string of a JSON object, and we'd pass that string as is to gson (and the other way around, get a string from gson, and pass it as a "raw object" to upickle). Such things are possible with jsoniter-scala, with a simple wrapper type, like here.

The bsp4j classes are meant to be serialized / deserialized with gson, we could just rely on that.

Apr 06 '25 16:04 alexarchambault

@lefou If you really want to serialize bsp4j classes, isn't it possible to use gson in some upickle readers / writers? Like having upickle give us the raw string of a JSON object, and we'd pass that string as is to gson (and the other way around, get a string from gson, and pass it as a "raw object" to upickle). Such things are possible with jsoniter-scala, with a simple wrapper type, like here.

The bsp4j classes are meant to be serialized / deserialized with gson, we could just rely on that.

Sure. But since the protocol classes are currently in the same dependency as the whole BSP server implementation, we'd force a lot of unwanted dependencies into the classpath of our modules. That's why I linked these open issues above. Maybe we can relax a bit, once we get the classpath separation from https://github.com/com-lihaoyi/mill/pull/4864, but in general, I'd like to avoid dependencies nobody used and which are known to cause trouble for users, like guava.

Apr 06 '25 16:04 lefou

I disabled the Bloop integration as part of https://github.com/com-lihaoyi/mill/pull/4879 to unblock the classpath isolation work, to be re-enabled if we decide we really need it.

IMO if Bloop isn't absolutely crucial, we should drop it. Maintaining 2 working IDE integrations is hard enough, adding a third redundant integration really makes the maintenance burden much greater in all aspects: multiplying code paths, confusing users, confusing maintainers trying to respond to filed issues when nobody is sure which of the two code paths is being used. I myself have been confused multiple times trying to choose BSP vs Bloop when setting up VsCode, so users who aren't experts in Scala IDEs will have zero hope of figuring it out at all.

Bloop was a good thing to have a few years back when Mill's own BSP support didn't exist, but now that mill.bsp works we should double down on it and make it really work, rather than hedging our bets with two half-broken/half-maintained IDE integrations and all the associated confusion and complexity

Apr 07 '25 15:04 lihaoyi

One topic worth discussing is what our policy is for using classloader isolation in mill-runner.

Previously, mill-runner was on the build.mill classpath, so it made sense to put heavyweight dependencies into isolated classloaders to avoid polluting the user-facing classpath. Currently we do that for mill-bsp-worker and mill-runner-worker.

Now, mill-runner is separate from build.mill, so the user-facing benefit of having a cleaner classpath no longer applies. There is still some benefit from isolating potentially-incompatible dependencies, but that's usually a non-issue (at least until we hit some concrete diamond dependency situation causing problems!)

If we no longer perform classloader isolation of different parts of mill-runner, we can simplify the code greatly by removing the isolation from mill-bsp-worker and mill-runner-worker and just using them directly as libraries inside mill-runner. And if we hit diamond-dependencies in mill-runner in the future, we can decide then whether or not to re-introduce classloader isolation

If we do want classloader isolation inside mill-runner, then that begs the question of what form it will take. The current approach of just ad-hoc resolving stuff from maven central using Coursier in various helper methods is super confusing. Would we want some kind of more structured framework for writing this kind of "top level" component?

Apr 09 '25 12:04 lihaoyi

From looking at the mill-runner-worker it's probably something we don't need to isolate. It's just the additional Scala compiler and in general under our control, so we don't need to expect large classpath changes while preserving a stable API. (I wonder, if we could move parts of it into our JvmWorker.)

The mill-bsp-worker OTOH is not entirely under our control. It depends on external dependencies (like guava) known to be heavy and problematic when shared. We also want to be able up update or totally change it's classpath while keeping the Mill API stable. I'd keep this worker isolated.

About the idea of having some API for loading Mill extensions (loading jars into isolated classpathes with shared interfaces), I like the idea. But I think we have a lot of todos already and this one is most likely an API addition which we can bring at any time, not necessarily now.

Apr 09 '25 14:04 lefou

Going to close this since the discussion has stopped and a lot of the discussed improvements have landed

Jul 04 '25 08:07 lihaoyi