CMake Deprecation Path
Curious as to what the plan is for the continued use of CMake post-rust migration as it seems that CMake is already passing much of the work over to Cargo as is.
Would be great to remove another build time dependency for those building in CI etc..
I think the latest writeup was https://github.com/fish-shell/fish-shell/pull/11583#discussion_r2150511320
AFAIK there are no blockers, it's mainly about porting a bit of code into installation scripts.
Yes, we should be able to move most things into separate scripts. One annoyance with cargo is that there does not seem to be a good way to run code after compilation.
AFAIK we don't know yet if we want to keep using make. At the moment we have a GNUmakefile and a BSDmakefile which is annoying, but they don't do much complicated work. I think we might be able to use cargo as the main build system and possibly call some shell scripts from build.rs scripts.
One thing which might be nontrivial to get without CMake are install locations, including for auxiliary files like completions and fish-script functions. We already allow building with the embed-data feature, which is enabled by default when calling cargo directly rather than via CMake. With this feature active, the files get embedded into the binary, so there is no need to put them in specific locations on the filesystem when installing fish. This doesn't work for gettext MO files which are used for localization, but we have PRs addressing that in the works. So when that is addressed, we could make all builds embed the required data, which would mean that we get a single binary which can be moved around arbitrarily and does not depend on any external files.
For what it's worth, the only reason I'm still using CMake for fish is to check if CMake builds still work after changing something. Otherwise, I use cargo directly for building and the build_tools/check.sh script for testing.
There are a few things for which we don't have a convenient CMake alternative yet, like building HTML docs (which is mostly just calling sphinx-build, but it depends on fish_indent, so it needs to run after building that), but for most things it just needs someone to spend some time porting the CMake code to an alternative.
Install locations and DESTDIR support are a pretty hard requirement for downstream packages. A small Makefile which knows how to drive cargo to generate the binaries and has an install target with the right variables seems like it would be enough. It's annoying to write in portable Make but I've mostly got it done for simple use cases. Looks a bit like https://gitlab.com/sequoia-pgp/sequoia-sqv/-/blob/main/Makefile
As an aside, I'm kind of amazed there's not already a consensus solution to this; is nobody shipping programs that are more than CLI utilities but not, like, a browser?
I am constantly impressed by the embed-data feature, but it's not very friendly to multi-user systems (yes, these still exist!) or those with their home directory on NFS. It increases the uncompressed binary size by 2.5x and compressed by 1.7x in my builds. I don't have a problem with it being on by default, but downstream packagers are probably going to want to turn it off.
If we keep supporting builds without embedded data it is indeed fairly annoying to get installation of the data files right.
I'm not particularly convinced by the multi-user argument. If the data files are shared between users, the fish executable should be as well, otherwise there could be mismatches between the binary version and the version of the data files. And if the executable is shared, it doesn't make much of a difference whether the data files are embedded in it or located elsewhere on the filesystem. Likewise for $HOME on NFS; if there is a system-wide fish install, it will not be in $HOME, so there should be no difference between embedding vs separate files. Regarding binary size, of course it gets larger when the data files are embedded vs separate, but combined storage usage for binary+data should not differ much between embedding the data and storing it separately. Am I missing some advantage of non-embedded files here?
The only potential downside of larger binaries I can think of is that load times might be increased. I don't know how eagerly operating systems load pages of the binary which are not accessed, and I haven't looked at the binary layout created by rust-embed, but my expectation is that the binary is lazily loaded and pages are only put into RAM on page faults, so I don't expect significant increases in runtime cost for the binaries with embedded data.
One theoretical advantage of separate data files is that the can be edited more easily, but I don't know if anyone would like to do that, and if so, if it wouldn't be better for our design to encourage upstream contributions instead.
I should say that I don't actually have a problem with continuing to ship a CMake build. I don't see much benefit in replacing it with cargo-make or another runner of some kind. cargo xtask, a shell script or a Makefile are probably reasonable approaches.
If the data files are shared between users, the fish executable should be as well, otherwise there could be mismatches between the binary version and the version of the data files.
I had it in my head that the embed-data feature was still doing the install-into-HOME thing, but that's wrong. I can't come up with any reason not to always use the embedded data if we can make it work for all the relevant assets. (It does smell a bit like the sort of thing that annoys systems admins and Linux packages, but the fact that all of the fish scripts and completions can be overridden reasonably easily is reassuring.)
There will still be some files that probably need to be installed by make/packaging tools like the fish.1 manpage and the pkg-config file, but that shouldn't be too difficult if the cargo build generates them.
I should say that I don't actually have a problem with continuing to ship a CMake build.
Having two separate build systems requires significant extra effort compared to only one, so the main benefit I see for dropping CMake support is that a lot of places can be simplified (e.g. the annoyance of always having to check for env vars set by CMake and falling back to cargo defaults for thinks like finding the correct place to put build artifacts). The effort for doing modifications to the build process is also much larger, not only because it has to happen for two build systems, but also because doing it for CMake requires being familiar with that or spending extra effort learning it, whereas people contributing to a Rust codebase will be familiar with Rust and cargo, so that part should be significantly more accessible.
The cargo xtask approach looks like it might be well-suited for our needs, probably preferable to make, since it does not add any dependency and allows writing arbitrary tasks in Rust.
I can't come up with any reason not to always use the embedded data if we can make it work for all the relevant assets.
Nice, always embedding would allow us to simplify some code dealing with both possibilities at the moment. For installing the files which need to be available outside of fish, we can probably use an xtask or make target that's a rewrite of cmake/Install.cmake. Setting the values currently obtained via CMAKE_INSTALL_* variables might not be trivial, I haven't looked into how these are set.
The advantage of CMake is that we can use it to actually ship software, so cargo should still be considered the secondary system. I'm in favour of transitioning across, but dropping a widely used and understood dependency is not a particularly high priority for me.
I don't see much benefit in replacing it with cargo-make or another runner of some kind. cargo xtask, a shell script or a Makefile are probably reasonable approaches.
In the spirit of simplicity I would suggest https://github.com/casey/just or cargo xtask over GNU Make, but overall the rationale does make sense.
I definitely prefer the CMake build system, as it's very easy for Fedora to correctly build and install things with it, as we have a reliable pattern for CMake. I suspect most other distros feel similarly.
Not sure it makes sense in the long term to necessitate the usage of multiple build systems just due to fish's lineage; consolidating into a single unified form clearly seems optimal.
I definitely prefer the CMake build system, as it's very easy for Fedora to correctly build and install things with it, as we have a reliable pattern for CMake.
It's understandable to want to use a well-established system. In the case of fish, and Rust projects in general, I imagine the main challenge is dealing with the Rust parts, like ensuring dependencies are available locally so builds don't require network access, not figuring out which command is used to wrap Rust compilation. But I have never packaged software so I'd appreciate more detailed views of distro packagers. What does CMake offer that would be lost by wrapping cargo with a different command?
Cargo doesn't know how to probe for external dependencies and can't really handle install paths well. It's not really designed to be a software installation system.
That's fair. We are moving away from external dependencies, so I don't think much probing that's not already happening in Rust will be required in the future.
Regarding installation, I agree that cargo is not well suited for the task. We are close to making fish a single binary without any external dependencies. There will still be some data files which should be installed on systems, like man pages. Looking at our current cmake/Install.cmake file, we have to do quite a lot of coding in CMake to install everything. The only thing we get from CMake are the CMAKE_INSTALL_* variables at the top. If we support similar variables for a CMake alternative, does CMake still have other advantages aside from being familiar?
DESTDIR support is the other thing that's important - the ability to configure everything to build into a certain prefix but to prepend something to that prefix when actually installing. Packaging systems use this to capture the actual contents of the package.
Downstream packagers do not love having to learn new build systems, which is understandable, and we'd need to make sure the extra controls are well-documented.
https://github.com/rust-lang/cargo/issues/2729#issuecomment-2407858338 is a reasonable survey of the current situation. A scan of the 203 manually-packaged Rust crates in Debian Trixie shows 23 use meson, 12 use CMake, a few use autoconf/automake and the rest largely hack around Cargo in various bespoke ways (generally written by the Debian packager rather than the upstream project).
cargo-parcel, as noted in that comment, looks promising (can be used with xtask pattern rather than installed separately, supports DESTDIR and most useful paths) but does not appear to be actively maintained.
As I think we're all aware, the overwhelming majority of fish users are going to install from apt/Homebrew/dnf, not cargo.