mill More refactorings for multi-build-sc files

This PR overhauls how subfolder build files are handled. Most of the application code changes in https://github.com/com-lihaoyi/mill/pull/3213 are reverted and replaced by new logic, but the changes to the test code remain.

Overall, this PR replaces package objects with normal objects, replaces the previous "wired up via classpath scanning" logic with "wired up via code-gening references", and replaces the previous "RootModule handling at runtime" logic with "RootModule handling at compile time". This should overall make the subfolder module.sc file handling both more robust and more ergonomic

Replace the package objects previously generated for root modules with normal objects named build or module (reflecting the file names)
- package objects are on their way out in Scala 3 https://docs.scala-lang.org/scala3/reference/dropped-features/package-objects.html. No concrete timeline, but good to stop using them. They also are a common source of bugs around compilation, incremental compilation, etc. since even their present-day implementation isn't terribly robust
- This also removes the need to refer to build as build.package since it is now a normal object
Generate final def aliases in the generated objects in order to make code references and module discovery work
- These aliases allow both Scala-code references to modules without needing this .build or .module suffix, as well as allow the Resolve.scala logic to work
- We can thus eliminate a whole bunch of gnarly code plumbing multiple-root-modules all over the Mill codebase and gnarly classpath-scanning code to try and identify the sub-folder root modules and wire them up.
- These aliases should more reliably generate conflicts than overlapping package object/objects, which seemed pretty flaky
We move the handling of RootModule from runtime reflection/classpath-scanning/custom-resolve-logic to a compiler plugin that "unpacks" any object module, object build, or object foo extends RootModule into the enclosing wrapper class
- We cannot do this as a naive source-code transform, because we need to preserve line numbers while performing non-trivial transformations (e.g. moving all the statements inside the object module out of it, merging the extends clause to the enclosing class PackageClass)
- This allows us to ensure RootModules are handled uniformly from Scala code and from the command line, where previously Scala code would need to write bar.qux.module.mytask suffix while the CLI would just write bar.qux.mytask
- By making the unpacking name-based, rather than type-based, this removes the "what should we name the root module?" degree of freedom that really shouldn't be necessary and ensures everyone names them consistently

Limitations:

Partially migrating subfolders out of a parent build.sc into a child module.sc no longer works. The aliases we generate for subfolder module.sc files always conflicts at compile time with any locally-defined modules
The error reporting for things that don't extend RootModule or MillBuildRootModule is pretty primitive, and does not understand subclasses or aliases since it relies on the untyped AST. This can be fixed by moving it to run after typer, but for now it should work for most cases
module.sc modules have two different names, e.g. build.foo.bar can be referred to as millbuild.foo.bar.module. It's a bit annoying, but it means build.foo.bar is somewhat consistent with the command line foo.bar, and foo.bar.module is consistent with foo/bar/module.sc. We could consider changing things to be consistent one way or another, but the basic things that we need to consider are:
- What the file is called? foo/bar/module.sc
- What the module is called at the command line? foo.bar
- What the module is called in code? build.foo.bar, or millbuild.foo.bar.module

Aug 27 '24 08:08 lihaoyi

This is probably ready for review

Aug 29 '24 11:08 lihaoyi

After some playing around, I think going with objects named package rather than module seems like the right thing to do here. These are already well-supported by both IntelliJ and VSCode, and minimize the amount of work that would need to be done to make subfolder build files work.

In fact, if we're willing to add a package declaration to the top of each file, IDE support works out of the box today if we name the root module object package rather than object module: both IntelliJ and VSCode are able to jump to definition to the root module of each folder when its named object package, as well as to the contents of each module. In contrast, using object module, we would need to teach Intellij how to jump around ourselves, which I expect to be a lot more work to implement and maintain in multiple IDEs

Having the files named package.sc also has the advantage that it looks very similar to the package.scala files people are already familiar with, that are used to denote the "primary" file in a particular folder. So it's probably not just IDEs that would find them more familiar

We can probably fiddle with the implementation by transforming the trees in our compiler plugins, but as a user-facing and tool-facing API I think package.scala with object package seems like the right way to go if it can be made to work

Aug 30 '24 01:08 lihaoyi

So, once we start to use packages, how can we access objects from the default (empty) package? Isn't this a JVM restriction? Don't we need some package for the root build object too?

Aug 30 '24 08:08 lefou

@lefou I updated the PR description with the current approach and results

Aug 31 '24 08:08 lihaoyi

This is a massive change. Unfortunately, my time to review and answer is short, so here are just some thoughts. There is a not so slight chance that I just follow my human urge to resist change. I hope not. I'm just discussing things for now.

File extension

Since build files don't aim for dedicated editors but instead try to be normal scala code, we should also use a normal Scala file type. (Is a pom.xml called pom.maven?) We should consider making it build.mill.sc or mill.sc. We could also try to use a using directive indicating it's a mill file.

No longer explicit imports

I don't like the fact that we now randomly pick up all .sc (.mill) files. Mill was always on of the good tools with a single place to inspect to get the whole picture. By default there were no additional files to consider. Unless you find an explicit $file-import, then look into that file too, or a $meta-import, that also consider a meta-build. Or an (not yet implemented) $submodules-import, ok, then also look into subdirectories. After your change, a user, a human or a tool needs to scan the whole directory and it's subdirectories to get the whole picture.

I don't think the arguement, that the scala compiler is doint it the same counts. Sourcecode is typically organised in dedicated directories. But build tools typically don't dictate what's in a project directory alone. It might follow conventions, but should be flexible (e.g. to adapt to the needs of different tools or managers). The directory containing the build files for Mill also have other content, meant for different tools.

We easily could limit the existing file import support to the build.sc and module.sc files, to avoid deep import traversals to detect all imported scripts. In the end, all $-imports are just magic shortcuts for the meta-build. Our meta-build is effectively supporting directory and single-files as it's sources.

IDE Support

I also don't seen the "hackines" in existing IDE support from Mills side. Or if I understand it your wrong, we should not optimize for the quirks of IDEs. The build project itself is a dedicated BSP build target listing all it's sources (the main build and the module files). If that doesn't work, that's because of a IDEs not fully supporting BSP in the sense, that it collides with dysfunctionalities in the IDE. E.g. associating a file in a subdirectory to one build targets, but another file next to it to another build target. But from a BSP perspective it should work.

Java Packages

I do see why the use of packages make things more natural to the language. I also see the technical reasons why we can't use the default package for the root project. But maybe we should use a more special root package? Something like _mill_. This is also to avoid collistions with imported libraries. Can't we let the user choose the root package name so she is able to avoid collisions? We could just pick the package of the root project and append it when we specify targets on the cli.

We could also assume, that a root build without a package does not support sub-packages and therefore we have a intuitive way to enable/disable submodules.

Finally

I struggle with the holistic approach we take here. We enforce all supported files in a project to follow the rules of a single tool. Other build tools like make allow me to pick a dedicated buildfile. Makefile is just a convenient shortcut. That allows me to experiment by copy and editing and the run make -f MakefileTest. I always hoped I can extends Mill to support the same. But when we no longer control the files we use but instead just pick up what we find, we more or less absent from that goal.

Aug 31 '24 13:08 lefou

Yeah this is a huge change. Nevertheless, I think it's one that is really necessary. It's definitely a big change to dump on you, but do give it a chance!

Apart from the philosophical differences, the migration cost should be reasonable: .sc files are still supported, import $file.foo.bar becomes import build_.foo.bar, and we could even add back import $file as an alias for import build_ as a backwards compatibility shim if necessary. The mill-bootstrap.patch changes show we need 5 lines changed for Mill's 2000-line build config, which isn't nothing but is also not a huge burden. If we alias$file/build_ as mentioned above we'd be down to 3 lines changed out of 2000.

A lot of these changes are to try and improve Mill's "IDE support", which IMO is worth a lot. Previously when everything was in one file, even half-broken IDE support was ok since the IDE could resolve most things locally in the same file. But if we want to properly support multi-file builds, we'd basically be going from "decent IDE support" to "totally broken IDE support", which isn't a good value proposition at all. Especially as we go outside the Scala community, "totally broken IDE support" is a non-starter. But with this PR, we just need a 1-line PR to IntelliJ and VSCode to add the file-extension-association, and we're done! Everything works even in multi-file builds!

I agree that we shouldn't be limited by the quirks of IDEs, but I think we need to be pragmatic in terms of timing and ordering. Mill's IDE support has not significantly improved in the 7 years the project has been around, so I don't think we can rely on "IDE catching up" to any changes we make: even BSP is really just another way to receive the same semi-broken support we used to have via GenIdea, where the IDE doesn't even know if build.sc is a Mill build, Ammonite script, Scala-CLI script, or IntelliJ/VSCode Worksheet. I also don't have the resources or expertise to go patch IntelliJ and VSCode separately to teach it about all of Mill's own quirks.

Many of Mill's quirks also go against how every IDE out there works. e.g. The idea that this foo.sc file is a Mill file, but that bar.sc file is a Scala-CLI script, depending on who ends up importing it, is something that no IDE will easily support. Every IDE works by associating file types with files in folders matching name/extension regexes, and no IDE does import graph analysis to figure out which file belongs to which file type. Even in languages like Python, where "whatever you import" is the way the platform works, IDEs still want "files in folders with a given extension" as the unit of analysis if you want them to give you useful code intelligence

Of course it's always theoretically possible to implement whatever you want in an IDE, but by being so unusual Mill makes it especially hard for IDE maintainers to provide a decent experience, which concretely means receiving Mill-specific IDE support later rather than sooner. Or never.

In terms of naming, pom.xml is AFAIK the exception. There are also a few multi-tool formats like pyproject.toml, and things like project.clj. But most build files are named after their tool: Makefile, Dockerfile, Jenkinsfile, build.gradle, BUILD.bazel, BUCK, pants.toml, Cargo.toml, build.sbt, webpack.config.js, vite.config.js, build.boot. Basically every build tool out there is <project-name><suffix>, <project-name>.<extension> or build.<project-name>. We could have build.mill.sc or build.mill.scala, similar to vite.config.js, but ultimately following build.sbt or build.gradle seem fine as well

If you want to easily change the file used by Mill, that is something we can implement even with this PR. I don't think these changes really restrict us from either allowing different names for the root build.sc or for other files in the codebase. If we want to allow ./mill -f build.mill2 and have it pick up .mill2 files instead of .mill, or whatever other override mechanism we want, we should be able to implement it without issue

Basically if we're fine with Mill builds being largely in one file, then the status quo is acceptable. But if we want Mill to work well when the build is broken into multiple files, then this PR is necessary to make it usable in practice. And we do need multi-file builds for Mill to be useful for non-trivial work; even the 1-2 person com-lihaoyi/mill codebase has its 2000+ line build file starting to become unmanageable, I can't imagine what would happen in 10-20 to 100-200-person projects

Aug 31 '24 13:08 lihaoyi

I added in the build_/$file migration compatibility measures mentioned above, along with making package declarations optional for the old .sc file extension. Now, ci/mill-bootstrap.patch just adds a single empty ci/package.sc file to allow ci/upload.sc and ci/shared.sc to be discovered

diff --git a/ci/package.sc b/ci/package.sc
index e69de29bb2..6a5b3e8b8e 100644
--- a/ci/package.sc
+++ b/ci/package.sc
@@ -0,0 +1 @@
+

Sep 01 '24 15:09 lihaoyi

Did some more updates for migration compatibility. As of this comment, the older .sc file behavior is largely preserved, including import $file-based discovery and semantics, and so ci/patch-mill-bootstrap.sh is empty. The new .mill files have the new requirements, including the need for package declarations and package.mill files.

The new compile-time unpacking of RootModules and the need for the root module to be named package applies to both .mill and .sc files.

Sep 02 '24 07:09 lihaoyi

Going to merge this and proceed with the dogfooding, rebootstrapping, and cutting of RC1. There's definitely issues we'll discover, and further things we need to do. But that can happen during the RC period

Sep 03 '24 06:09 lihaoyi

.scala as an extension could possibly work from a technical level, but there will still be user-facing confusion if we want Mill to be able to target non-Scala developers, and possibly technical confusion for tools that can't differentiate between Mill build files and application source files

Have you considered something like .mill.scala?

This way most tools (e.g. bat) will continue working, but I think that solves the concern in the quote?

Sep 03 '24 09:09 nafg

.scala as an extension could possibly work from a technical level, but there will still be user-facing confusion if we want Mill to be able to target non-Scala developers, and possibly technical confusion for tools that can't differentiate between Mill build files and application source files

Have you considered something like .mill.scala?

This way most tools (e.g. bat) will continue working, but I think that solves the concern in the quote?

That's a great idea! Otherwise, even the most basic syntax highlighter needs to know what Mill is. This happens already for sbt, but .mill.scala could disambiguate Mill files while avoiding making all editors in the world that are aware of Scala, having to know that .mill is an extension for Scala.

I see the value of hiding Scala to Java developers, but it is no doubt that the build file is a Scala file and to master it you need to know some Scala :)

Gradle started out with build.gradle and now it is migrating to Kotlin and it's build file is called... build.gradle.kt, which is exactly equivalent to build.mill.scala. It's another example which makes me think it is the right thing to do.

Sep 03 '24 15:09 lolgab

I considered .mill.scala and .mill.sc, but decided to go with .mill

I think net-net, the cost of adding a new file extension is not that large. Sure, initially you get zero support and everything sucks, but most editors allow you to associate a file extension manually, and it's usually a trivial PR to each of the various editors or environments (e.g. Github linguist) to associate the new extension.

The major thing that I think we need to be more careful about is the semantics. Teaching an editor a a file extension is trivial; teaching an editor custom file-type identification logic or scoping/identifier-resolution logic (which is what would be needed to properly support the old import $file-based scripts) is much more difficult. There's a reason that after 7 years, IntelliJ still doesn't know whether something is an Mill build, Ammonite script, Scala-CLI script, or IntellIJ worksheet. And it's not for lack of trying: the IntelliJ folks implemented import $file support (kinda), let it bitrot, and had to be pestered to reverse engineer it later to bring it back (the original implementors had already left). Better to make things easier for them so they don't need to jump through so many hoops to support Mill.

Overall, a new file extension seems big, but I bet in the next week or so we'll be able to send PRs to most of the major ones, and that's that. By the time 0.12.0-RC1 is even cut it should already be a non-issue. Whether we use .mill or .mill.sc or .mill.scala then becomes immaterial. After that the benefits of conciseness (.mill.scala is quite a mouthful) and encapsulation (it's a Mill script, not a Scala application) win out

Sep 03 '24 15:09 lihaoyi

There's a reason that after 7 years, IntelliJ still doesn't know whether something is an Mill build, Ammonite script, Scala-CLI script, or IntellIJ worksheet.

I guess that's because build.sc files were already supported as Ammonite script very early. And even Ammonite re-used .sc from workbooks before.

You make it sound easy, but making a new file extension recognized is hard, esp. if there are already other file extensions for the same type. (I know many editors prefer to guide the user to use one standard instead of many, e.g .adoc vs .asciidoc.)

While *.sc files are scripts paired with some semantics, *.scala files are never standalone, they need a version and a classpath to have any support in IDEs. And any IDE will need to have this support for Mill anyways, otherwise it's just syntax highlighting, which would work reasonably better with .mill.scala.

Sep 03 '24 19:09 lefou

I think net-net, the cost of adding a new file extension is not that large. Sure, initially you get zero support and everything sucks, but most editors allow you to associate a file extension manually, and it's usually a trivial PR to each of the various editors or environments (e.g. Github linguist) to associate the new extension.

I think this equation is wrong. What I know, avoiding the .mill extension on our side is rather cheap. Compared to the count of editors on the planet. And even if you get your PRs accepted eventually in five years to all these editors you think are relevant, you often can't use the latest version or need sometimes work with even another editor or a formatter or tools/libs like editorconfig. We even don't have an idea what editors are used, e.g. I often use a simple CLI editor like nano to make bumps in maintained projects. Some editors I use are implemented in languages I'm not proficient in. I don't think opening a PR or whatever the workflow needed for it is easy at all. Let alone the social factor. How to convince an arbitrary editor maintainer who has to maintain hundrets of languages, that she needs to add yet another extention for Scala, just for a little niche build tools she never has heard about? How to convince anyone, if we even don't agree in our team? You're choice seems pragmatic, but dealing with it for others is everything but easy. We loose what we already had.

Sep 04 '24 11:09 lefou