go icon indicating copy to clipboard operation
go copied to clipboard

proposal: cmd/go: do not download “modules” that contain no go.mod or *.go

Open bcmills opened this issue 6 years ago • 16 comments

At the moment, go mod download will happily try to extract and download any arbitrary repository as long as it can be resolved by some means (through a hard-coded hosting service such as github.com, or using a distinguished extension like .git), even if it does not contain anything even marginally related to building Go code.

I am not aware of any reasonable use-case for such a repository:

  • It's not useful for storing test data, because we currently provide no mechanism for the tests to actually locate that data. (Modules are not guaranteed to be loaded from the module cache — for example, they might be subject to a replace directive — and since the test itself is run within the directory containing its source code, it has no way to locate the data or run go list within the module that invoked it.)

  • It's not useful for C headers (for use with cgo), for the same reason.

  • It might be useful for fetching non-Go inputs to go generate: in theory, the generator could run go mod download $MODULE to locate the sources at the required version. But the output of go generate is intended to be checked in anyway, which makes the use of modules somewhat spurious: if an explicit version of the non-Go inputs appears in the module's requirements, then everyone using the generated package will have an extra module to fetch that is guaranteed to have no effect on the build, and in most cases the go generate program can just as easily git clone (or similar) the input data at a specific revision.

Furthermore, if someone did find a way to make modules without Go source code useful (for the above use-cases or others), it's trivial to add a go.mod file to indicate that the repository really is somehow intended for use with Go source code. (We need to support go.mod-only modules anyway, since they can arise naturally when splitting a large root module into smaller nested modules.)


On the other hand, module proxies tend to rely on the go command to decide what is or is not a valid module, and accepting arbitrary non-Go repositories potentially exposes such proxies to a significant amount of additional load.


Therefore, I propose that we change the go command to explicitly reject any “module” that both contains no .go source files and lacks a go.mod file.

CC @rsc @jayconrod @heschik @hyangah @katiehockman @thepudds @marwan-at-work @ianthehat

bcmills avatar May 06 '19 19:05 bcmills

Sounds good to me as long as the Go command only checks for .go files and does not validate the import paths when a proxy runs go mod download $MODULE

For more on why: https://github.com/golang/go/issues/31458

marwan-at-work avatar May 06 '19 20:05 marwan-at-work

@marwan-at-work

as long as the Go command only checks for .go files and does not validate the import paths

For that do you mean something like #31662?

bcmills avatar May 06 '19 20:05 bcmills

@bcmills yes that issue would guarantee a proxy to go mod download a module via its VCS path and not its Vanity import path. I believe this already works as expected for my use case

Thanks!

marwan-at-work avatar May 06 '19 20:05 marwan-at-work

Another example use case is to deliver supporting scripts and assets with module. Not saying it is the greatest idea, but I included base Makefile from go/pkg/mod/path@ver/Makefile.

Such approach was working with vendor before.

I agree for such case it should be not a problem to at least add go.mod in the repo root.

vearutop avatar May 07 '19 05:05 vearutop

Actually in the proxy we am writing, we download the repository using the VCS, not using go get or go download. So, I don't know whether the repository is a useful go module or not, I let the proxy user's go decide on that. So actually, a go module without go files doesn't bother me at all. I'd rather not have to decide in the proxy whether a repository contains go files or not, so I am not in favor of this proposal.

beoran avatar May 07 '19 05:05 beoran

3.1 GB of data in go/pkg/mod

Got bit by this today while trying to find out why my disk space was gone. The biggest offender seems to be the linux kernel tree. Like the actual upstream linux kernel somehow.

Xe avatar May 30 '19 17:05 Xe

This is probably a stupid question, but:

If you don't download it, how do you know what it contains?

seebs avatar Feb 03 '21 18:02 seebs

I think I remember people getting around the no reliable path problem by invoking go mod vendor so dependencies can be located in vendor/

seankhliao avatar Feb 03 '21 18:02 seankhliao

On the other hand, module proxies tend to rely on the go command to decide what is or is not a valid module, and accepting arbitrary non-Go repositories potentially exposes such proxies to a significant amount of additional load.

As @seebs pointed out, unless there is a magic git or vcs command to do this cheaply without cloning, module proxy already has downloaded the repository when the go mod download concluded there was no go code. (still there is a question on whether it's acceptable for the module proxy to stop serving such modules, but that is a separate issue).

If this proposal is accepted and implemented, I think module proxies need a way to distinguish this mode of failure from other go command failures (network issue, etc) so they know that they don't need to retry and download the requested versions again in the future.

hyangah avatar Nov 29 '22 21:11 hyangah

I think the proxy should be able to use the Origin and Reuse fields (added to go mod download -json for #53644) for this case.

bcmills avatar Nov 29 '22 21:11 bcmills

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc avatar Nov 30 '22 20:11 rsc

I agree that Origin and Reuse should provide enough information.

rsc avatar Dec 07 '22 18:12 rsc

Does anyone object to doing this? (It's hard to imagine why anyone would object given that there is no Go code in these "modules".)

rsc avatar Dec 07 '22 18:12 rsc

I thought of one possible complication.

If the module at the root of a repo is carved up into nested modules, and the carved-out modules do not leave any packages in the original root module, it could be left with no Go source files, but still needed in order to check for import-path collisions. (That is: we need to module's source code only in order to verify that it does not contain the packages loaded from nested modules.)

The root module must continue to exist, because modules that require versions of it from before the split must be able to upgrade to remove the root-module packages.

Finally, if the root module was at a +incompatible version, its maintainers cannot add a go.mod file to make it a valid module, because that would make the +incompatible versions invalid.

I think it would still be possible to add a .go file with //go:build ignore at the repo root to keep the module valid (the import path at the repo root necessarily cannot conflict with any nested module), but that is kind of a funky workaround and would potentially invalidate empty-repo-root modules that were previously valid.

On a practical note, https://github.com/Azure/azure-sdk-for-go may be very close to that exact situation. 😅

bcmills avatar Dec 07 '22 18:12 bcmills

Having to create a .go file seems like a useful signal in this case.

rsc avatar Dec 14 '22 18:12 rsc

Does anyone object to accepting this proposal?

rsc avatar Dec 21 '22 18:12 rsc

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc avatar Jan 04 '23 19:01 rsc

A couple of people from the Gentoo Foundation were asking some seemingly related questions in #51284, including https://github.com/golang/go/issues/51284#issuecomment-1334221407.

It might be worthwhile for someone from the core Go team to make a brief comment there, including in the context of this proposal here.

thepudds avatar Jan 10 '23 12:01 thepudds

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

rsc avatar Jan 11 '23 19:01 rsc

This blog post is making the rounds that explores this issue and showcases how the go module proxy could be used for arbitrary storage or as a command and control infra for malware. It's on the HN frontpage at the time I'm writing this, so this is getting a decent amount of exposure.

Realistically it's trivial to still do the same thing and add a go.mod and still have the proxy cache it, so I don't imagine this changes things much, or any urgency around resolving this.

The only reason I bring it up is that due to that post there might be some renewed interest in this issue and people trying to pull a few more shenanigans.

daenney avatar May 25 '24 16:05 daenney