go
go copied to clipboard
cmd/go: build with vendor depends on modcache for checksums
What version of Go are you using (go version
)?
$ go version go version devel go1.17-e4615ad Wed May 26 13:25:43 2021 +0000 linux/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/arccy/.cache/go-build" GOENV="/home/arccy/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/tmp/gomodcache.f4Wp" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/arccy/.data/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/home/arccy/sdk/gotip" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/home/arccy/sdk/gotip/pkg/tool/linux_amd64" GOVCS="" GOVERSION="devel go1.17-e4615ad Wed May 26 13:25:43 2021 +0000" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/arccy/tmp/trivy-mod-parse/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3032020717=/tmp/go-build -gno-record-gcc-switches"
What did you do?
build a module with a vendor dir. Whether or not the module sums are embedded depends on the module cache
$ export GOMODCACHE=$(mktemp -d -t gomodcache.XXXX)
$ go build -mod=vendor
$ go version -m trivy-mod-parse
trivy-mod-parse: devel go1.17-e4615ad Wed May 26 13:25:43 2021 +0000
path github.com/ebati/trivy-mod-parse
mod github.com/ebati/trivy-mod-parse (devel)
dep github.com/davecgh/go-spew v1.1.1
dep github.com/go-sql-driver/mysql v0.0.0-00010101000000-000000000000
=> github.com/go-sql-driver/mysql v1.5.0
$ go mod tidy
go: downloading github.com/davecgh/go-spew v1.1.1
go: downloading github.com/go-sql-driver/mysql v1.5.0
$ go build -mod=vendor
$ go version -m trivy-mod-parse
trivy-mod-parse: devel go1.17-e4615ad Wed May 26 13:25:43 2021 +0000
path github.com/ebati/trivy-mod-parse
mod github.com/ebati/trivy-mod-parse (devel)
dep github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
dep github.com/go-sql-driver/mysql v0.0.0-00010101000000-000000000000
=> github.com/go-sql-driver/mysql v1.5.0 h1:ozyZYNQW3x3HtqT1jira07DN2PArx2v7/mN66gGcHOs=
What did you expect to see?
build output not affected by module cache contents
What did you see instead?
checksums only embedded when module cache is populated
cc @bcmills @jayconrod @matloob
This is quite the issue for us, since there is also no way to skip adding the module info to the binary.
If your go.sum
file is tidy, go build -mod=vendor
should be using the checksums from there.
If it isn't tidy (missing entries), perhaps we should emit a clear error message and fail the build with -mod=vendor
?
(FWIW, I suspect that the issue in the original post is due to a bad interaction with replace
.)
clarifying: the issue (or one of the issues) is that you can go mod vendor
, edit the dependencies (or manually put any new ones) in ./vendor
, and then go build
, and go version
will still show the the dependencies as "legit" (without the manual edits)
@bcmills the file is tidy, the only difference is having the dependencies in your ~/go folder (which will be filled when you run go mod tidy
) and your vendor folder or only in your vendor folder. Here is the go version -m
output on a build when having the dependencies in ~/go:
entc: go1.18
path command-line-arguments
dep ariga.io/atlas v0.8.0 h1:iSNmWLCoI1i4Zu9Gqnu8iWgO4ZLNcR42/d6r78DHMQI=
dep entgo.io/contrib v0.3.3-0.20220909110541-462c4e569873 h1:VqkwKyPnxLSoYJdjT6QWzoh4Be2QOB2JF2rUJus4EpY=
dep entgo.io/ent v0.11.4-0.20221001062602-1029a2d3ba2a h1:T28WZZUdeJb7DQVQNnZkr3pasIdDrDVC41eQIV0hvTU=
dep github.com/99designs/gqlgen v0.17.20 h1:O7WzccIhKB1dm+7g6dhQcULINftfiLSBg2l/mwbpJMw=
dep github.com/agext/levenshtein v1.2.1 h1:QmvMAjj2aEICytGiWzmxoE0x2KZvE0fvmqMOfy2tjT8=
dep github.com/agnivade/levenshtein v1.1.1 h1:QY8M92nrzkmr798gCo3kmMyqXFzdQVpxLlGPRBij0P8=
dep github.com/apparentlymart/go-textseg/v13 v13.0.0 h1:Y+KvPE1NYz0xl601PVImeQfFyEy6iT90AvPUL1NNfNw=
dep github.com/fsnotify/fsnotify v1.5.4 h1:jRbGcIw6P2Meqdwuo0H1p6JVLbL5DHKAKlYndzMwVZI=
dep github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
dep github.com/gin-gonic/gin v1.8.1 h1:4+fr/el88TOO3ewCmQr8cx/CtZ/umlIRIs5M4NTNjf8=
dep github.com/go-logr/logr v1.2.3 h1:2DntVwHkVopvECVRSlL5PSo9eG+cAkDCuckLubN+rq0=
dep github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
... etc ...
And here the go version -m
output on a build after having removed ~/go:
entc: go1.18
path command-line-arguments
dep ariga.io/atlas v0.8.0
dep entgo.io/contrib v0.3.3-0.20220909110541-462c4e569873
dep entgo.io/ent v0.11.4-0.20221001062602-1029a2d3ba2a
dep github.com/99designs/gqlgen v0.17.20
dep github.com/agext/levenshtein v1.2.1
dep github.com/agnivade/levenshtein v1.1.1
dep github.com/apparentlymart/go-textseg/v13 v13.0.0
dep github.com/fsnotify/fsnotify v1.5.4
dep github.com/gin-contrib/sse v0.1.0
dep github.com/gin-gonic/gin v1.8.1
dep github.com/go-logr/logr v1.2.3
dep github.com/go-logr/stdr v1.2.2
... etc ...
Conclusion: when you don't have the dependencies in ~/go, it will not insert the sums.
I did some more research into this, the build info is built by this function: https://github.com/golang/go/blob/master/src/cmd/go/internal/load/pkg.go#L2269
It uses modfetch.Sum
to get the Sum of a module, but modfetch.Sum
only returns a Sum if it's available in the cache, it also says so in the comments:
// Sum returns the checksum for the downloaded copy of the given module,
// if present in the download cache.
https://github.com/golang/go/blob/599a1e40c6dc4d02e2d0ae5a22659b49dc327e40/src/cmd/go/internal/modfetch/fetch.go#L732
I have added some debugging and the returned ziphash path is something like this:
/home/jerbob92/go/pkg/mod/cache/download/gopkg.in/yaml.v3/@v/v3.0.1.ziphash
It then tries to open that file and doesn't exist ofcourse to then lockedfile.Read
results in an error:
lockedfile error for gopkg.in/[email protected]: open /home/jerbob92/go/pkg/mod/cache/download/gopkg.in/yaml.v3/@v/v3.0.1.ziphash: no such file or directory
Aha! I have been hunting this down for a little while, started with what I thought was bugs in SBoM software, eventually leading here.
Is it fair to say that this is an error in how it processes and loads the modules? One of the points of go.sum
is to validate that this desired version is the actual version being used (by checking sha sums), but since it has it (in this case from vendor/
) and builds with it, should it not be able to do one of the following?
- take the sha from the
go.sum
file; OR - take the sha from the
vendor/
directory (or however it is using the content there to validate that it fits thego.sum
hash)
In my case, I cannot run go mod tidy
before running go build
, as I am relying on vendor/
to build offline.
In any case, is there some way I can assist?
vendor directories don't contain enough data to verify against checksums, ref #27348
How interesting. Thanks for the helpful reference.
Does that mean that when I go build -mod=vendor
that it does not validate hashes from go.sum
, unlike a non-vendored go build
? I guess it must, based on the linked issue.
So what is the right approach for combining:
- Module information embedding provided in the build and hash sums
- Vendor preset and predownloaded modules (possibly with the assumption of no network access during build)?
Probably for now when we are building with -mod=vendor
we should embed the module information (from vendor/modules.txt
) but not the checksum information (because we don't in general know that the vendor
directory still matches the checksums).
@bcmills sounds like a good compromise yes
@bcmills does it actually verify it for non-vendor builds though?
I also think that not embedding the checksums with mod=vendor
(the default when there is a vendor directory) is a good compromise for now.
does it actually verify it for non-vendor builds though?
For non-vendor
builds, checksums are verified when modules are first downloaded to the module cache and when the user invokes go mod verify
explicitly. (We trust that once a module has been downloaded it will not be modified, but of course it's fine to run go mod verify
explicitly to check for filesystem corruption.)
@bcmills If it doesn't verify it at build, adding the ziphash guarantees about as much as the go.sum hash, so in that case you might not want to add the checksum at all. Because why trust the user not editing the cache folder, but not trust it with not editing the vendor folder?
@jerbob92 captured my thoughts really well (thanks).
If when we build w/o -mod=vendor
we verify checksums build time but not when we build with -mod=vendor
, then I could see a good reason to have a difference between including checksums and not.
However, if the build-time verification is identical - no verification - but just download time, then we should include the checksums both times as well.
An alternative might be to verify build time for both, and therefore include enough in vendor/
to be able to verify the checksums, but I am aware that is a pretty significant change to how vendoring works (and growth in size of every vendor/
dir).
Beyond the "how do we fix this", what is the right approach using the current state of tooling to get checksums into the compiled binary? Is it to ensure that the go mod cache is properly populated before build, even if vendor/
is in place? So if I clone some repo that has checked-in vendor/
, I run go mod download
or similar before doing build? And if so, how do I handle offline builds (which is one of the key reasons people use vendor/
)?
@jerbob92
why trust the user not editing the cache folder, but not trust it with not editing the vendor folder?
The module cache is per-user (or per-machine), verified at download time, and by default has read-only permissions. The vendor folder is generally committed to a shared repository, is not verified, and has the same permissions as other ordinary files in the repo.
They are not equivalent.
They are not equivalent
That is fair enough. Your point about, "we shouldn't include the hash unless we can verify with what we have at hand" is eminently sensible, although silently not verifying at all if there is no cache is a bit strange behaviour. It means that go build
and go build -mod=vendor
provide fundamentally different guarantees (or one provides and one does not, or more correctly, one provides always, and one provides only when the go mod cache is there and ignores it when not). That would surprise users, I would think (well, at least to me 😁 ).
What would be the right path to getting equivalency? And what would be the right path to using and verifying and including hashes in vendor/
ed environments, especially when there is no ability to download the cache go mod download
prior to go build -mod=vendor
?
@samthanawalla, this might be another good one for you to look into while you're working on #52792.
Change https://go.dev/cl/564195 mentions this issue: cmd/go: Do not embed checksums when building with vendor.