go icon indicating copy to clipboard operation
go copied to clipboard

cmd/go: build with vendor depends on modcache for checksums

Open seankhliao opened this issue 3 years ago • 18 comments

What version of Go are you using (go version)?

$ go version
go version devel go1.17-e4615ad Wed May 26 13:25:43 2021 +0000 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/arccy/.cache/go-build"
GOENV="/home/arccy/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/tmp/gomodcache.f4Wp"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/arccy/.data/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/arccy/sdk/gotip"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/arccy/sdk/gotip/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.17-e4615ad Wed May 26 13:25:43 2021 +0000"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/arccy/tmp/trivy-mod-parse/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3032020717=/tmp/go-build -gno-record-gcc-switches"

What did you do?

build a module with a vendor dir. Whether or not the module sums are embedded depends on the module cache

$ export GOMODCACHE=$(mktemp -d -t gomodcache.XXXX)

$ go build -mod=vendor
$ go version -m trivy-mod-parse
trivy-mod-parse: devel go1.17-e4615ad Wed May 26 13:25:43 2021 +0000
	path	github.com/ebati/trivy-mod-parse
	mod	github.com/ebati/trivy-mod-parse	(devel)	
	dep	github.com/davecgh/go-spew	v1.1.1	
	dep	github.com/go-sql-driver/mysql	v0.0.0-00010101000000-000000000000
	=>	github.com/go-sql-driver/mysql	v1.5.0	

$ go mod tidy
go: downloading github.com/davecgh/go-spew v1.1.1
go: downloading github.com/go-sql-driver/mysql v1.5.0

$ go build -mod=vendor
$ go version -m trivy-mod-parse
trivy-mod-parse: devel go1.17-e4615ad Wed May 26 13:25:43 2021 +0000
	path	github.com/ebati/trivy-mod-parse
	mod	github.com/ebati/trivy-mod-parse	(devel)	
	dep	github.com/davecgh/go-spew	v1.1.1	h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
	dep	github.com/go-sql-driver/mysql	v0.0.0-00010101000000-000000000000
	=>	github.com/go-sql-driver/mysql	v1.5.0	h1:ozyZYNQW3x3HtqT1jira07DN2PArx2v7/mN66gGcHOs=

What did you expect to see?

build output not affected by module cache contents

What did you see instead?

checksums only embedded when module cache is populated


cc @bcmills @jayconrod @matloob

seankhliao avatar May 26 '21 16:05 seankhliao

This is quite the issue for us, since there is also no way to skip adding the module info to the binary.

jerbob92 avatar Oct 26 '22 14:10 jerbob92

If your go.sum file is tidy, go build -mod=vendor should be using the checksums from there. If it isn't tidy (missing entries), perhaps we should emit a clear error message and fail the build with -mod=vendor?

(FWIW, I suspect that the issue in the original post is due to a bad interaction with replace.)

bcmills avatar Oct 26 '22 17:10 bcmills

clarifying: the issue (or one of the issues) is that you can go mod vendor, edit the dependencies (or manually put any new ones) in ./vendor, and then go build, and go version will still show the the dependencies as "legit" (without the manual edits)

caarlos0 avatar Oct 26 '22 17:10 caarlos0

@bcmills the file is tidy, the only difference is having the dependencies in your ~/go folder (which will be filled when you run go mod tidy) and your vendor folder or only in your vendor folder. Here is the go version -m output on a build when having the dependencies in ~/go:

entc: go1.18
	path	command-line-arguments
	dep	ariga.io/atlas	v0.8.0	h1:iSNmWLCoI1i4Zu9Gqnu8iWgO4ZLNcR42/d6r78DHMQI=
	dep	entgo.io/contrib	v0.3.3-0.20220909110541-462c4e569873	h1:VqkwKyPnxLSoYJdjT6QWzoh4Be2QOB2JF2rUJus4EpY=
	dep	entgo.io/ent	v0.11.4-0.20221001062602-1029a2d3ba2a	h1:T28WZZUdeJb7DQVQNnZkr3pasIdDrDVC41eQIV0hvTU=
	dep	github.com/99designs/gqlgen	v0.17.20	h1:O7WzccIhKB1dm+7g6dhQcULINftfiLSBg2l/mwbpJMw=
	dep	github.com/agext/levenshtein	v1.2.1	h1:QmvMAjj2aEICytGiWzmxoE0x2KZvE0fvmqMOfy2tjT8=
	dep	github.com/agnivade/levenshtein	v1.1.1	h1:QY8M92nrzkmr798gCo3kmMyqXFzdQVpxLlGPRBij0P8=
	dep	github.com/apparentlymart/go-textseg/v13	v13.0.0	h1:Y+KvPE1NYz0xl601PVImeQfFyEy6iT90AvPUL1NNfNw=
	dep	github.com/fsnotify/fsnotify	v1.5.4	h1:jRbGcIw6P2Meqdwuo0H1p6JVLbL5DHKAKlYndzMwVZI=
	dep	github.com/gin-contrib/sse	v0.1.0	h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
	dep	github.com/gin-gonic/gin	v1.8.1	h1:4+fr/el88TOO3ewCmQr8cx/CtZ/umlIRIs5M4NTNjf8=
	dep	github.com/go-logr/logr	v1.2.3	h1:2DntVwHkVopvECVRSlL5PSo9eG+cAkDCuckLubN+rq0=
	dep	github.com/go-logr/stdr	v1.2.2	h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=

... etc ...

And here the go version -m output on a build after having removed ~/go:

entc: go1.18
	path	command-line-arguments
	dep	ariga.io/atlas	v0.8.0	
	dep	entgo.io/contrib	v0.3.3-0.20220909110541-462c4e569873	
	dep	entgo.io/ent	v0.11.4-0.20221001062602-1029a2d3ba2a	
	dep	github.com/99designs/gqlgen	v0.17.20	
	dep	github.com/agext/levenshtein	v1.2.1	
	dep	github.com/agnivade/levenshtein	v1.1.1	
	dep	github.com/apparentlymart/go-textseg/v13	v13.0.0	
	dep	github.com/fsnotify/fsnotify	v1.5.4	
	dep	github.com/gin-contrib/sse	v0.1.0	
	dep	github.com/gin-gonic/gin	v1.8.1	
	dep	github.com/go-logr/logr	v1.2.3	
	dep	github.com/go-logr/stdr	v1.2.2	

... etc ...

Conclusion: when you don't have the dependencies in ~/go, it will not insert the sums.

jerbob92 avatar Oct 26 '22 17:10 jerbob92

I did some more research into this, the build info is built by this function: https://github.com/golang/go/blob/master/src/cmd/go/internal/load/pkg.go#L2269

It uses modfetch.Sum to get the Sum of a module, but modfetch.Sum only returns a Sum if it's available in the cache, it also says so in the comments:

// Sum returns the checksum for the downloaded copy of the given module,
// if present in the download cache.

https://github.com/golang/go/blob/599a1e40c6dc4d02e2d0ae5a22659b49dc327e40/src/cmd/go/internal/modfetch/fetch.go#L732

I have added some debugging and the returned ziphash path is something like this: /home/jerbob92/go/pkg/mod/cache/download/gopkg.in/yaml.v3/@v/v3.0.1.ziphash

It then tries to open that file and doesn't exist ofcourse to then lockedfile.Read results in an error:

lockedfile error for gopkg.in/[email protected]: open /home/jerbob92/go/pkg/mod/cache/download/gopkg.in/yaml.v3/@v/v3.0.1.ziphash: no such file or directory

jerbob92 avatar Oct 27 '22 09:10 jerbob92

Aha! I have been hunting this down for a little while, started with what I thought was bugs in SBoM software, eventually leading here.

Is it fair to say that this is an error in how it processes and loads the modules? One of the points of go.sum is to validate that this desired version is the actual version being used (by checking sha sums), but since it has it (in this case from vendor/) and builds with it, should it not be able to do one of the following?

  • take the sha from the go.sum file; OR
  • take the sha from the vendor/ directory (or however it is using the content there to validate that it fits the go.sum hash)

In my case, I cannot run go mod tidy before running go build, as I am relying on vendor/ to build offline.

In any case, is there some way I can assist?

deitch avatar Oct 28 '22 07:10 deitch

vendor directories don't contain enough data to verify against checksums, ref #27348

seankhliao avatar Oct 28 '22 12:10 seankhliao

How interesting. Thanks for the helpful reference.

Does that mean that when I go build -mod=vendor that it does not validate hashes from go.sum, unlike a non-vendored go build? I guess it must, based on the linked issue.

So what is the right approach for combining:

  • Module information embedding provided in the build and hash sums
  • Vendor preset and predownloaded modules (possibly with the assumption of no network access during build)?

deitch avatar Oct 28 '22 12:10 deitch

Probably for now when we are building with -mod=vendor we should embed the module information (from vendor/modules.txt) but not the checksum information (because we don't in general know that the vendor directory still matches the checksums).

bcmills avatar Oct 28 '22 14:10 bcmills

@bcmills sounds like a good compromise yes

caarlos0 avatar Oct 28 '22 14:10 caarlos0

@bcmills does it actually verify it for non-vendor builds though? I also think that not embedding the checksums with mod=vendor (the default when there is a vendor directory) is a good compromise for now.

jerbob92 avatar Oct 28 '22 19:10 jerbob92

does it actually verify it for non-vendor builds though?

For non-vendor builds, checksums are verified when modules are first downloaded to the module cache and when the user invokes go mod verify explicitly. (We trust that once a module has been downloaded it will not be modified, but of course it's fine to run go mod verify explicitly to check for filesystem corruption.)

bcmills avatar Oct 28 '22 20:10 bcmills

@bcmills If it doesn't verify it at build, adding the ziphash guarantees about as much as the go.sum hash, so in that case you might not want to add the checksum at all. Because why trust the user not editing the cache folder, but not trust it with not editing the vendor folder?

jerbob92 avatar Oct 28 '22 21:10 jerbob92

@jerbob92 captured my thoughts really well (thanks).

If when we build w/o -mod=vendor we verify checksums build time but not when we build with -mod=vendor, then I could see a good reason to have a difference between including checksums and not.

However, if the build-time verification is identical - no verification - but just download time, then we should include the checksums both times as well.

An alternative might be to verify build time for both, and therefore include enough in vendor/ to be able to verify the checksums, but I am aware that is a pretty significant change to how vendoring works (and growth in size of every vendor/ dir).

Beyond the "how do we fix this", what is the right approach using the current state of tooling to get checksums into the compiled binary? Is it to ensure that the go mod cache is properly populated before build, even if vendor/ is in place? So if I clone some repo that has checked-in vendor/, I run go mod download or similar before doing build? And if so, how do I handle offline builds (which is one of the key reasons people use vendor/)?

deitch avatar Oct 30 '22 09:10 deitch

@jerbob92

why trust the user not editing the cache folder, but not trust it with not editing the vendor folder?

The module cache is per-user (or per-machine), verified at download time, and by default has read-only permissions. The vendor folder is generally committed to a shared repository, is not verified, and has the same permissions as other ordinary files in the repo.

They are not equivalent.

bcmills avatar Nov 01 '22 14:11 bcmills

They are not equivalent

That is fair enough. Your point about, "we shouldn't include the hash unless we can verify with what we have at hand" is eminently sensible, although silently not verifying at all if there is no cache is a bit strange behaviour. It means that go build and go build -mod=vendor provide fundamentally different guarantees (or one provides and one does not, or more correctly, one provides always, and one provides only when the go mod cache is there and ignores it when not). That would surprise users, I would think (well, at least to me 😁 ).

What would be the right path to getting equivalency? And what would be the right path to using and verifying and including hashes in vendor/ed environments, especially when there is no ability to download the cache go mod download prior to go build -mod=vendor?

deitch avatar Nov 01 '22 14:11 deitch

@samthanawalla, this might be another good one for you to look into while you're working on #52792.

bcmills avatar Feb 08 '24 19:02 bcmills

Change https://go.dev/cl/564195 mentions this issue: cmd/go: Do not embed checksums when building with vendor.

gopherbot avatar Feb 14 '24 21:02 gopherbot