BinaryBuilder.jl icon indicating copy to clipboard operation
BinaryBuilder.jl copied to clipboard

Generate multiple packages with the a single builder

Open giordano opened this issue 4 years ago • 10 comments

From time to time I look to the package managers of Linux distributions to see if we can pick up some interesting ideas. One thing that I think would be really cool to have here is to be able to generate multiple JLL packages with a single builder: the result of a build doesn't go into a single tarball, but it might be split into many of them.

One fancy application is to be able to generated:

  • Libfoo_jll: contains only bin/ and lib/, this is the runtime part, what the Julia packages will use;
  • Libfoo_dev_jll: contains include/, header files are generally useless for Julia packages and they mostly clutter ~/.julia/artifacts/ with dozens of small filess. Ideally, this would be automatically installed, if available, when Libfoo_jll is used as dependency in a build;
  • Libfoo_dbg_jll: contains the debug symbols of the shared library, that users can optionally install to get more useful debug information about crashes or errors. Based on an idea by @keno.

Also, I think that LLVM_full_jll is currently "wrong": IMO it should simply be an empty metapackage binding all the other pieces. Instead now it's a monster package containing the same data as its pieces, which means that if we use both LLVM_full_jl and libLLVM_jll in a build, they would step onto each other's toes. Having a single builder that produces all other subpackages would probably make @vchuravy happy, too.

giordano avatar May 06 '20 23:05 giordano

maybe also a jll for the original source directory such that we can download it automatically in the debugger if necessary.

Keno avatar May 06 '20 23:05 Keno

I agree that this is desirable. I'm not entirely sure that the right way to do it is to create multiple JLL packages; or at least, not necessarily the user-facing way.

Here are my thoughts:

  • For some projects, we have the genuine desire to split a JLL into multiple independent packages; Clang_jll and LLVM_jll really don't have anything to do with eachother; sometimes you may want Clang_jll and not LLVM_jll and vice-versa. The fact that they both stem from the same build process is more or less an implementation detail. (Oh, and they both rely upon LibLLVM_jll, but that's fine). To save on build time/duplicated effort, it would be nice to be able to split a single build_tarballs()'s output into multiple, independent, JLL packages.

  • For most of the projects in existence, we have a mixture of files; some things are generally essential (dynamic libraries, executables) some things are nice to have (include files, external debugging symbols) and some things are almost never needed (static libraries). It would be nice to be able to split a single build_tarballs()'s output into different "configurations", making use of https://github.com/JuliaLang/Pkg.jl/issues/1780

First and foremost, for this to work nicely, I think we're going to need to break up build_tarballs() a bit; right now we have everything built with the very deep assumption that we can flow smoothly from sources to JLLs, but that breaks down in a few places such as IntelOpenMP, CUDA, LLVM, MKL, etc... I think we need to split build_tarballs() up into two separate pieces: the piece where we call autobuild() as many times as we need, generating unpacked prefixes of build products, then the piece where we carve those prefixes up into JLLs. We can, of course, continue to expose a build_tarballs() that does all that automatically, but we need to have a re-think of the underlying mechanisms to make this effortless.

API overview

I envision having a function build_binaries!() that we call in a similar manner to build_tarballs(), but it doesn't take in name, version or products; all it does is build unpacked prefixes, and return the meta information about that build, coalesced into a single meta object:

meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)

This meta object will contain all the information we're used to having in e.g. the JSON object (and in fact will be what we serialize with --json-meta in the future; this will make it much easier to understand how we mock out parts of the BB pipeline when running on Yggdrasil), and is what we will use when we perform the second step, which is extraction and JLL construction:

# This would be defined by default, but just explicitly make it for illustration's sake
everything_extractor = raw"""
mv ${srcdir}/* ${prefix}/
"""
build_jll!(meta, name, version, platforms, dependencies, everything_extractor)

What this enables

This gives us the flexibility to do an awful lot:

  • Do the "fancy toys" trick to support separate sources/dependencies/whatever per-platform:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, filter(p -> !Sys.iswindows(p), platforms), dependencies)
build_binaries!(meta, ARGS, win_sources, win_script, filter(p -> Sys.iswindows(p), platforms), dependencies)
build_jll!(meta, name, version, platforms, dependencies, everything_extractor)

Note that if we're going through the trouble of rewriting this stuff, we can probably get rid of should_build_platform() in fancy toys by doing that automatically inside of build_binaries!(); e.g. if a platform is given within ARGS, use that to filter out the passed-in platforms objects, and if there's nothing left, return eagerly.

  • Split a single build into multiple JLL packages:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)

LibLLVM_extractor = raw"""
# Copy over `llvm-config`, `libLLVM` and `include`, specifically.
mkdir -p ${prefix}/include ${prefix}/tools ${libdir} ${prefix}/lib
mv -v ${srcdir}/include/llvm* ${prefix}/include/
mv -v ${srcdir}/tools/llvm-config* ${prefix}/tools/
mv -v ${srcdir}/$(basename ${libdir})/*LLVM*.${dlext}* ${libdir}/
mv -v ${srcdir}/lib/*LLVM*.a ${prefix}/lib
"""
build_jll!(meta, "LibLLVM_jll", version, platforms, dependencies, LibLLVM_extractor)

Clang_extractor = raw"""
...
"""
build_jll!(meta, "Clang_jll", version, platforms, dependencies, Clang_extractor)
...
  • Construct multiple "configurations" or "variants" for a JLL, making use of https://github.com/JuliaLang/Pkg.jl/issues/1780 to provide access to a "default" variant (just the essentials), a "build" variant (including headers and static libraries), and a "debug" package (less optimizations, more debug info):
# Build once with `-O2` and extract it into a default variant, as well as a "build" variant:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)

base_extractor = raw"""
for f in $(find_binary_objects ${srcdir}); do
    rp=$(relpath ${srcdir} ${f})
    mkdir -p $(dirname ${rp})
    mv ${f} ${prefix}/${rp}
done
"""
build_jll!(meta, name, version, platforms, dependencies, base_extractor)
build_jll!(meta, "$(name)+build", version, platforms, dependencies, everything_extractor)

# Build once with `-O1 -g` and bundle it into the "debug" variant:
debug_meta = BuildMetadata()
build_binaries!(debug_meta, ARGS, sources, debug_script, platforms, dependencies)
build_jll!(meta, "$(name)+debug", version, platforms, dependencies, everything_extractor)

The "variants" would all be put into the same JLL release as artifacts with names that have the + postpended (as that's not a valid JLL name, of course), and we'd have ways for the user to request which artifacts get installed on their system through things like https://github.com/JuliaLang/Pkg.jl/issues/1780. We could arbitrarily decide that BB itself always installs the +build variant (if available) into the prefix when building. (Or we could even allow for Depedency() objects to provide a variant kwarg)

What do you guys think?

staticfloat avatar May 14 '20 21:05 staticfloat

That sounds fantastic! I would call them build! and generate!/package!

vchuravy avatar May 14 '20 21:05 vchuravy

Intelligent building and serving of debug symbols

If we had "partial artifact" download support, we could simplify this a bit, in that we could generate only a single tarball that has everything: binaries, headers, and separate debug files. We could then work some Pkg server magic to allow requesting a union of subtrees rather than always the entire content tree. This would allow us to, for instance, request the union of subtrees that corresponds to just the shared libraries within lib/ and the binaries within bin. The PkgServer would then generate a cut-down tarball containing those resources and pass it down to us. This would be the "minimal" artifact variant, while the "build" variant would include things like headers, static libraries, etc... Finally, the "full" variant would include external debug symbols that were stripped out from the executables during build.

To strip out debug symbols into external files, we can use the following tools:

  • For ELF and COFF files:
objcopy --only-keep-debug $file $debug_file
strip --strip-debug --strip-unneeded $file
objcopy --add-gnu-debuglink $debug_file $file

Assuming we are able to work our PkgServer magic above, we will be able to stream down content trees where these files exist on-disk side-by-side, which makes the whole thing much easier. If we must keep the files separate, this becomes more difficult, we'd probably have to modify files on-disk to get relative pathing correct, or force debuggers to do the searching themselves (this is easier if we embed build ids, see below).

  • For Mach-O files, we can use dsymutil to create .dSYM bundles (or files, if we want, by passing in -flat):
dsymutil $file

Note that we probably want to start adding --build-id=sha1 to our LDFLAGS to aid in debugging efforts, as that allows for easier matching of files.

Doing this magically via compiler wrappers/BB magic

We can force -g into all compiler invocations via our compiler wrappers, and invoke dsymutil upon all executables at the end of the build if we're running on Darwin. It really should be that simple. :)

staticfloat avatar Jun 20 '20 20:06 staticfloat

Oh, I also just thought to myself it would be cool to switch between e.g. debug and non-debug versions through Preferences, so a JLL package would default to installing a minimal variant, but it can be opted-in to a higher variant by setting a Preference in the overall Project.toml that is using the JLL.

staticfloat avatar Jun 20 '20 22:06 staticfloat

Thinking about this again, it would also be really sweet for debug versions of JLLs to include all source files referenced by the DWARF files, stored in a predictable place (like <$artifact_path>/src) so that we can use source-map to get lldb/gdb to find the source when we're debugging an artifact.

We can add a post-processing step that inspects all DWARF files, finds all referenced source files (even autogenerated ones) and stores them in the appropriate location within a $destdir/src directory. Then we just need a convenient way to map /workspace/srcdir => $artifact_path/src within lldb/gdb and we'll have a really slick debugging experience for our users.

staticfloat avatar Aug 17 '20 02:08 staticfloat

I think most of our compiler support split dwarf info? https://gcc.gnu.org/wiki/DebugFissionDWP

vchuravy avatar Aug 17 '20 12:08 vchuravy

In the last few weeks I've been thinking about this issue again, and coming up with beautiful ideas like using Preferences.jl to install debug version of packages, just to realise that Elliot already proposed it :disappointed:

Another idea that just came to my mind is to have dev/debug tarballs as lazy artifacts of the same JLL package, instead of their own packages, but Elliot anticipated me again:

The "variants" would all be put into the same JLL release as artifacts with names that have the + postpended

I like this idea! In particular, I'm thinking about splitting also the logs into their own tarballs. A nice benefit is that this could make the runtime tarball reproducible across multiple identical rebuilds.

One additional thing to mention is that now that we have JLLWrappers.jl we can automatically generate functions to download the additional artifacts, without having to change anything in the packages.

giordano avatar Jun 06 '21 00:06 giordano

Just wanted to add that for JLLs which link against libjulia, it would be nice to have debug variants which link against libjulia-debug (this is orthogonal to the question of debug symbols and how to handle them). Right now, I am debugging a Julia package (Oscar.jl) involving four JLLs linking against libjulia (libcxxwrap-julia, libsingular-julia, libpolymake-julia, GAP) and it isn't exactly fun.

So perhaps there could be another "variant marker" indicating "download this instead of the default if this is a Julia debug build"

fingolfin avatar Jul 27 '21 07:07 fingolfin

I just came across debuginfod which seems very complementary. It allows for gdb and others to auto-fetch debuginfo!

vchuravy avatar May 10 '22 00:05 vchuravy