go icon indicating copy to clipboard operation
go copied to clipboard

cmd/dist: remove precompiled .a files from binary distributions

Open jayconrod opened this issue 4 years ago • 26 comments

The downloadable archives at https://golang.org/dl/ currently contain precompiled .a files for all packages in the standard library. Each archive has a set of .a files for one platform, plus another set for -race builds.

These files take up quite a bit of space, and they're fast to rebuild on demand. We should consider removing them from binary Go distributions.

For example, in 1.17rc1 on darwin/amd64, the whole distribution uncompressed is 435M. The pkg/darwin_amd64 directory is 97M (22%), and the pkg/darwin_amd64_race directory is 109M (25%). Compressed as a zip file with default settings, the archive is 135M. Without .a files, it's 86M (63%).

After #40042 was fixed, the C compiler version is included in the cache key for each package that uses cgo. That means that if no C compiler is installed on the system, or if a different C compiler version is installed (very common), go build and other commands will rebuild packages that depend on cgo instead of using the versions installed in $GOROOT/pkg. As of 1.17rc1, there are 27 packages in std that use cgo directly or indirectly, most prominently, net. The precompiled files for these packages are almost never used unless the installed C compiler exactly matches the version used to build the Go distribution.

Note that the fix for #40042 is causing builds to fail on systems without a C compiler installed (#47215), so it may be partially or completely rolled back in 1.17. If we implement this proposal, we'd have the same problem, so we may want to think about changing the default value of CGO_ENABLED (#47251).

cc @rsc @bcmills @matloob

jayconrod avatar Jul 16 '21 21:07 jayconrod

Another reason to do this is that internet speeds vary wildly. To some people, downloading an extra 50MiB is a split second, but to many others it's staring at their screen for an extra minute. I don't have an ETA for FTTH reaching my building in the UK, for example :)

CPU speeds and compile times also vary, but I think the lower bound is much less worrying - even with a five-year-old thin laptop, one should be able to build the standard library in 10-20s.

mvdan avatar Jul 17 '21 21:07 mvdan

I think it's pretty important that a Go binary installation work on a system with no C compiler installed. And at least in the past on macOS Go programs would only work if using the cgo version of the net package. So I think we at least need to provide the .a files for the standard library packages that use cgo, which I think is currently net and os/user.

ianlancetaylor avatar Jul 18 '21 21:07 ianlancetaylor

I think it's pretty important that a Go binary installation work on a system with no C compiler installed. And at least in the past on macOS Go programs would only work if using the cgo version of the net package.

Does that imply that on macOS it already isn't possible to build a working Go program that depends on net with CGO_ENABLED=0?

If so, that would imply that without a working C compiler it also isn't possible to build programs that depend on both net and some third-party package whose source files vary based on the cgo build constraint: the cgo constraint is met if CGO_ENABLED=1 even if the C compiler is present. So that would force users to choose between two options:

  1. Build with CGO_ENABLED=1 and get build errors due to attempting to compile third-party //go:build cgo files without a working C compiler.
  2. Build with CGO_ENABLED=0, and get the appropriate source files for third-party packages but a non-working net package.

I think we could resolve that in one of three ways:

  1. Change the net package to not require a C compiler on macOS.
  2. Change cmd/go to build the net package on macOS using a C compiler even if CGO_ENABLED=0.
  3. Declare that a C compiler is required in order to build Go programs that depend on net on macOS.

bcmills avatar Jul 19 '21 14:07 bcmills

On further consideration, I don't think option (2) above is viable. There are too many other ways to invalidate cache entries, such as by setting (or not setting) -trimpath, and those will also cause the precompiled libraries not to be used.

And I believe our builders already invalidate the cache for libraries bundled in the macOS release, because they build using a non-default value for CGO_CFLAGS (see #33598, #46347, #46292).

bcmills avatar Jul 19 '21 15:07 bcmills

@ianlancetaylor

I think it's pretty important that a Go binary installation work on a system with no C compiler installed.

I would agree with you a priori, but as I understand the current state of the world, the introduction of the build cache broke this case many releases back, with no complaints (or at least not enough that we've tried to fix it). The .a files that ship do not actually have the right cache keys embedded inside them to be used by essentially any out-of-the-box install of Go. Instead, they get recomputed (in the cache only, not in the install location) the first time you build something.

Assuming I am right about that (I have not done the experiment myself), then I think it is OK to just drop the .a files entirely and not worry about the "cgo without C compiler" case anymore.

rsc avatar Jul 19 '21 17:07 rsc

I would agree with you a priori, but as I understand the current state of the world, the introduction of the build cache broke this case many releases back, with no complaints (or at least not enough that we've tried to fix it). The .a files that ship do not actually have the right cache keys embedded inside them to be used by essentially any out-of-the-box install of Go. Instead, they get recomputed (in the cache only, not in the install location) the first time you build something.

@rsc I was under this impression too until looking at #47215 last week. The C compiler version only became part of the cache key in #40042. So this is true for 1.17rc1, but not for 1.16 or lower. If neither CC nor CGO_CFLAGS are explicitly set, the go command will happily use the precompiled runtime/cgo.a, even if no C compiler is installed. We're looking at a narrow rollback of this in CL 335409.

(@bcmills pointed out cases where the cache keys don't match on macOS, but they do match on Linux, and that's the platform I'm most worried about, since a lot of folks are building in Docker without installing a C compiler).

jayconrod avatar Jul 19 '21 18:07 jayconrod

  1. Change the net package to not require a C compiler on macOS.

I wonder how viable this is?

I know very little about the guts of the net package. Reading https://pkg.go.dev/net#hdr-Name_Resolution, it looks like there are Go and cgo implementation for name resolution. When cgo is available, they're both compiled in and selected dynamically (GODEBUG=netdns=go or GODEBUG=netdns=cgo can pick one at run-time). On macOS, the cgo version seems to be preferred.

But does the cgo version actually need to be written with cgo? On macOS at least, we link dynamically with /usr/lib/libSystem.B.dylib even with CGO_ENABLED=0, so we don't need the external linker. CL 227037 leads me to believe it's possible to call C code from Go with assembly trampolines. (This is mostly a thought experiment; I don't want to re-implement net or awaken Cthulhu).

jayconrod avatar Jul 19 '21 18:07 jayconrod

@jayconrod I think you are correct that with our current implementation we could call the macOS libraries directly without having to use cgo. The same is true on AIX and Solaris, for that matter.

ianlancetaylor avatar Jul 19 '21 19:07 ianlancetaylor

re CGO_ENABLED=0 GOOS=darwin: https://github.com/golang/go/issues/12524#issuecomment-853965035

jfesler avatar Jul 19 '21 20:07 jfesler

CGO_ENABLED=0 is effectively the same as cross-compiled, and same headaches: netdns's go is simply inadequate, at least for use cases where DNS queries are routed based on domain name to different resolvers by the operating system. /etc/resolv.conf does not adequately describe the OS DNS routing behavior; there's no good way for netdns to ever do this correctly.

jfesler avatar Jul 19 '21 20:07 jfesler

I think we can plausibly check in the relevant cgo-generated pieces for the few stdlib packages that need cgo. Then we wouldn't need a special case for "install runtime/cgo.a but no other .a files".

rsc avatar Jul 21 '21 19:07 rsc

My suggestion here would be to define that go install x only ever installs binaries, never .a files. That would change this from being about cmd/dist and the Go distribution to being about cmd/go. The distributions would shrink as a side effect, and the meaning of go install would become clearer.

rsc avatar Jul 28 '21 16:07 rsc

Agreed that go install should no longer install .a files.

I think the main technical blocker is being able to build net with its current netcgo functionality without actually needing cgo. That seems like it can be done either by checking in pre-generated cgo code or by linking against C code directly with //go:cgo_import_dynamic. Not sure exactly what it will look like, but it seems plausible and probably a good change to make anyway.

For a broad go install change, I'm a bit worried about -buildmode=shared and -linkshared. Not sure those work at all in module mode currently, so it would be good to have a plan to deprecate or fix.

jayconrod avatar Jul 28 '21 16:07 jayconrod

Yes, buildmode=shared is dead and has been for a long time. Anyone using it must not be using modules. We could potentially do a special case in buildmode=shared for just the standard library, but I don't see how to make more than that work (and even that is a bit iffy).

/cc @mwhudson

rsc avatar Aug 04 '21 17:08 rsc

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc avatar Aug 04 '21 18:08 rsc

@mwhudson, do you know of any existing uses of -buildmode=shared and -linkshared anymore? We believe they have been broken since modules are introduced and are thinking about removing them. Thanks!

rsc avatar Aug 11 '21 17:08 rsc

I'm not aware of any used for those buildmodes currently, no. Never quite got to the point of them being genuinely useful before our requirements shifted a bit unfortunately.

mwhudson avatar Aug 12 '21 00:08 mwhudson

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc avatar Aug 18 '21 18:08 rsc

My suggestion here would be to define that go install x only ever installs binaries, never .a files. That would change this from being about cmd/dist and the Go distribution to being about cmd/go. The distributions would shrink as a side effect, and the meaning of go install would become clearer.

Would this removal leave any method of installing .a files aside from build -o file.a?

(Context: For reproducible builds we often use isolated build environments where the build cache cannot be persisted between builds. Options for reusing build artifacts currently are, in order of ease:

  1. Use install in GOPATH-mode and save the installed archives in build output;
  2. Build/install in module-aware mode and use more-opaque and possibly-brittle methods to determine which parts of the build cache to save in build output; or
  3. Use build -o file.a for each package, which requires getting the build order the same as cmd/go so the actionID portion of BuildIDs match.

My concern is that this change would remove option 1, without adding any facility for making 2 or 3 less brittle.)

iskarian avatar Aug 24 '21 03:08 iskarian

Would this removal leave any method of installing .a files aside from build -o file.a?

@iskarian Probably not. And if go install doesn't write .a files to GOROOT/pkg or GOPATH/pkg, there's no reason for it to check for or use .a files there either.

I think your option 2 can be done safely, either by running go build std with an empty cache, then saving the entire cache, or by running go list -json -export std and saving the files in the Target field. We'll do something like that to support Bazel and Blaze, which also don't use Go's cache and only use the go command to build std.

The go command is really meant to be used with a cache though. The cache has been mandatory since Go 1.12, and part of the intent was to eventually remove GOPATH/pkg. At this point, I don't think there's a good reason to design around not having a cache.

jayconrod avatar Aug 24 '21 15:08 jayconrod

@jayconrod Reasonable. Thanks for the explanation.

I think your option 2 can be done safely, either by running go build std with an empty cache, then saving the entire cache, or by running go list -json -export std and saving the files in the Target field. We'll do something like that to support Bazel and Blaze, which also don't use Go's cache and only use the go command to build std.

Saving the cache (or cache deltas) wholesale is less-than-desirable because it is not easy to see what each entry corresponds to (and verify that only what is supposed to be saved, is saved). Right now, after building, go list -json -export mypackage and saving the cache entries in the Export field seems to work. But will that be stable in the future? Export's description only guarantees that the file it points to contains export data, even though in practice this seems to always be the .a file.

iskarian avatar Aug 24 '21 17:08 iskarian

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

rsc avatar Aug 25 '21 18:08 rsc

Change https://go.dev/cl/444015 mentions this issue: go/internal/gcimporter,cmd/compile/internal/importer: skip tests that need 'go list' on js/wasm

gopherbot avatar Oct 19 '22 16:10 gopherbot

Change https://go.dev/cl/432535 mentions this issue: cmd/go: don't install most GOROOT .a files in pkg

gopherbot avatar Oct 27 '22 17:10 gopherbot

Change https://go.dev/cl/436135 mentions this issue: cmd/dist: produce intermedate .a files in a temporary location

gopherbot avatar Oct 27 '22 17:10 gopherbot

Change https://go.dev/cl/446116 mentions this issue: cmd/go: don't substitute '$WORK' for work directory in -x heredocs

gopherbot avatar Oct 28 '22 01:10 gopherbot

Change https://go.dev/cl/446635 mentions this issue: cmd/compile/internal/types2: fix tests on js/wasm

gopherbot avatar Oct 31 '22 15:10 gopherbot

Change https://go.dev/cl/446638 mentions this issue: cmd/api: skip tests when 'os/exec' is supported but 'go build' is not

gopherbot avatar Oct 31 '22 19:10 gopherbot

Change https://go.dev/cl/446735 mentions this issue: internal/releasetargets: remove Race config from 1.20 linux-amd64

gopherbot avatar Oct 31 '22 19:10 gopherbot

Change https://go.dev/cl/445358 mentions this issue: cmd/go: add move test for goroot

gopherbot avatar Oct 31 '22 20:10 gopherbot