proposal: cmd/go: do not download “modules” that contain no go.mod or *.go
At the moment, go mod download will happily try to extract and download any arbitrary repository as long as it can be resolved by some means (through a hard-coded hosting service such as github.com, or using a distinguished extension like .git), even if it does not contain anything even marginally related to building Go code.
I am not aware of any reasonable use-case for such a repository:
-
It's not useful for storing test data, because we currently provide no mechanism for the tests to actually locate that data. (Modules are not guaranteed to be loaded from the module cache — for example, they might be subject to a
replacedirective — and since the test itself is run within the directory containing its source code, it has no way to locate the data or rungo listwithin the module that invoked it.) -
It's not useful for C headers (for use with
cgo), for the same reason. -
It might be useful for fetching non-Go inputs to
go generate: in theory, the generator could rungo mod download $MODULEto locate the sources at the required version. But the output ofgo generateis intended to be checked in anyway, which makes the use of modules somewhat spurious: if an explicit version of the non-Go inputs appears in the module's requirements, then everyone using the generated package will have an extra module to fetch that is guaranteed to have no effect on the build, and in most cases thego generateprogram can just as easilygit clone(or similar) the input data at a specific revision.
Furthermore, if someone did find a way to make modules without Go source code useful (for the above use-cases or others), it's trivial to add a go.mod file to indicate that the repository really is somehow intended for use with Go source code. (We need to support go.mod-only modules anyway, since they can arise naturally when splitting a large root module into smaller nested modules.)
On the other hand, module proxies tend to rely on the go command to decide what is or is not a valid module, and accepting arbitrary non-Go repositories potentially exposes such proxies to a significant amount of additional load.
Therefore, I propose that we change the go command to explicitly reject any “module” that both contains no .go source files and lacks a go.mod file.
CC @rsc @jayconrod @heschik @hyangah @katiehockman @thepudds @marwan-at-work @ianthehat
Sounds good to me as long as the Go command only checks for .go files and does not validate the import paths when a proxy runs go mod download $MODULE
For more on why: https://github.com/golang/go/issues/31458
@marwan-at-work
as long as the Go command only checks for
.gofiles and does not validate the import paths
For that do you mean something like #31662?
@bcmills yes that issue would guarantee a proxy to go mod download a module via its VCS path and not its Vanity import path. I believe this already works as expected for my use case
Thanks!
Another example use case is to deliver supporting scripts and assets with module. Not saying it is the greatest idea, but I included base Makefile from go/pkg/mod/path@ver/Makefile.
Such approach was working with vendor before.
I agree for such case it should be not a problem to at least add go.mod in the repo root.
Actually in the proxy we am writing, we download the repository using the VCS, not using go get or go download. So, I don't know whether the repository is a useful go module or not, I let the proxy user's go decide on that. So actually, a go module without go files doesn't bother me at all. I'd rather not have to decide in the proxy whether a repository contains go files or not, so I am not in favor of this proposal.
Got bit by this today while trying to find out why my disk space was gone. The biggest offender seems to be the linux kernel tree. Like the actual upstream linux kernel somehow.
This is probably a stupid question, but:
If you don't download it, how do you know what it contains?
I think I remember people getting around the no reliable path problem by invoking go mod vendor so dependencies can be located in vendor/
On the other hand, module proxies tend to rely on the go command to decide what is or is not a valid module, and accepting arbitrary non-Go repositories potentially exposes such proxies to a significant amount of additional load.
As @seebs pointed out, unless there is a magic git or vcs command to do this cheaply without cloning, module proxy already has downloaded the repository when the go mod download concluded there was no go code. (still there is a question on whether it's acceptable for the module proxy to stop serving such modules, but that is a separate issue).
If this proposal is accepted and implemented, I think module proxies need a way to distinguish this mode of failure from other go command failures (network issue, etc) so they know that they don't need to retry and download the requested versions again in the future.
I think the proxy should be able to use the Origin and Reuse fields (added to go mod download -json for #53644) for this case.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
I agree that Origin and Reuse should provide enough information.
Does anyone object to doing this? (It's hard to imagine why anyone would object given that there is no Go code in these "modules".)
I thought of one possible complication.
If the module at the root of a repo is carved up into nested modules, and the carved-out modules do not leave any packages in the original root module, it could be left with no Go source files, but still needed in order to check for import-path collisions. (That is: we need to module's source code only in order to verify that it does not contain the packages loaded from nested modules.)
The root module must continue to exist, because modules that require versions of it from before the split must be able to upgrade to remove the root-module packages.
Finally, if the root module was at a +incompatible version, its maintainers cannot add a go.mod file to make it a valid module, because that would make the +incompatible versions invalid.
I think it would still be possible to add a .go file with //go:build ignore at the repo root to keep the module valid (the import path at the repo root necessarily cannot conflict with any nested module), but that is kind of a funky workaround and would potentially invalidate empty-repo-root modules that were previously valid.
―
On a practical note, https://github.com/Azure/azure-sdk-for-go may be very close to that exact situation. 😅
Having to create a .go file seems like a useful signal in this case.
Does anyone object to accepting this proposal?
Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group
A couple of people from the Gentoo Foundation were asking some seemingly related questions in #51284, including https://github.com/golang/go/issues/51284#issuecomment-1334221407.
It might be worthwhile for someone from the core Go team to make a brief comment there, including in the context of this proposal here.
No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group
This blog post is making the rounds that explores this issue and showcases how the go module proxy could be used for arbitrary storage or as a command and control infra for malware. It's on the HN frontpage at the time I'm writing this, so this is getting a decent amount of exposure.
Realistically it's trivial to still do the same thing and add a go.mod and still have the proxy cache it, so I don't imagine this changes things much, or any urgency around resolving this.
The only reason I bring it up is that due to that post there might be some renewed interest in this issue and people trying to pull a few more shenanigans.