storage: `NewReader()` returns outdated objects
Client
Storage
Environment
Local
Go Environment
$ go version $ go env
go version go1.22.1 darwin/arm64
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/wjkoh/Library/Caches/go-build'
GOENV='/Users/wjkoh/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/wjkoh/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/wjkoh/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.1/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.1/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.1'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/wjkoh/bake/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/y1/jsmqrm1s6y55qh9s2rmw39lh0000gn/T/go-build2463423068=/tmp/go-build -gno-record-gcc-switches -fno-common'
Code
e.g.
package main
import (
"context"
"log"
"os"
"time"
"cloud.google.com/go/storage"
"github.com/google/uuid"
)
func main() {
const bucket = "some-public-bucket"
object := uuid.New().String()
ctx := context.Background()
storageClient, err := storage.NewClient(ctx)
if err != nil {
log.Fatal(err)
}
// Create a test object.
w := storageClient.Bucket(bucket).Object(object).NewWriter(ctx)
_, err = w.Write([]byte("foobar 1"))
if err != nil {
log.Fatal(err)
}
err = w.Close()
if err != nil {
log.Fatal(err)
}
// First read.
reader1, err := storageClient.Bucket(bucket).Object(object).NewReader(ctx)
if err != nil {
log.Fatal(err)
}
log.Printf("Generation from reader1.NewReader(): %d", reader1.Attrs.Generation)
err = reader1.Close()
if err != nil {
log.Fatal(err)
}
attrs, err := storageClient.Bucket(bucket).Object(object).Attrs(ctx)
if err != nil {
log.Fatal(err)
}
log.Printf("Generation from Attrs(): %d", attrs.Generation)
// Overwrite the test object to change its generation.
w = storageClient.Bucket(bucket).Object(object).NewWriter(ctx)
_, err = w.Write([]byte("foobar 2"))
if err != nil {
log.Fatal(err)
}
err = w.Close()
if err != nil {
log.Fatal(err)
}
// This delay didn't change the behavior.
//time.Sleep(10 * time.Second)
// Second read.
reader2, err := storageClient.Bucket(bucket).Object(object).NewReader(ctx)
if err != nil {
log.Fatal(err)
}
log.Printf("Generation from reader2.NewReader(): %d", reader2.Attrs.Generation)
err = reader2.Close()
if err != nil {
log.Fatal(err)
}
attrs, err = storageClient.Bucket(bucket).Object(object).Attrs(ctx)
if err != nil {
log.Fatal(err)
}
log.Printf("Generation from Attrs(): %d", attrs.Generation)
}
Expected behavior
I anticipated that the generation numbers from NewReader().Attrs.Generation and object.Attrs().Generation would match.
Actual behavior
However, after overwriting the test object, the generation number obtained from ReaderObjectAttrs is outdated, while the number from object.Attrs() is current. For instance, in the following output, the generation number from the second Reader is 1712996651719769, despite it being expected to be 1712996653117313.
2024/04/13 17:24:12 Generation from reader1.NewReader(): 1712996651719769
2024/04/13 17:24:12 Generation from Attrs(): 1712996651719769
2024/04/13 17:24:13 Generation from reader2.NewReader(): 1712996651719769
2024/04/13 17:24:13 Generation from Attrs(): 1712996653117313
Additional context
cloud.google.com/go/storage v1.40.0
I have identified the cause of the issue. My test bucket was public, so its Cache-Control was set to public, max-age=3600 by default as stated in https://cloud.google.com/storage/docs/caching#built-in_caching_for. It appears that NewReader() adheres to this cache setting, which was unexpected. I would have assumed that the cache-control setting only applies to HTTP requests, such as when retrieving an object's public or signed URL through a GET request, as it is an HTTP header. It is unclear why the Go client library is also affected by this HTTP caching, as users of the library may not be aware of the underlying network protocols being used (e.g. HTTP, gRPC, or other). Furthermore, the documentation does not mention that the HTTP Cache-Control setting can impact an object reader created through NewReader().
Is there a way to completely disable caching for the Go client library? Alternatively, could we at least add a note to NewReader() informing users that caching is applied and they may encounter outdated objects?
Thanks for the issue @wjkoh .
The Go library does use HTTP requests (using the XML API by default for downloads) so it is equivalent to a GET request via the browser. We also offer support for the JSON API (also HTTP based) as well as gRPC (this is new and still under an allowlist). We can add a note about this to the docs to clarify potentially, but I don't think it is possible to disable cache-control for HTTP requests on public objects (unless you disable cache-control on that object entirely).
Can you confirm if doing the download with https://pkg.go.dev/cloud.google.com/go/storage#WithJSONReads set yields the same result?
Thank you for your response, @tritone . I now understand the behavior, but could you please explain why client.Bucket(bucket).Object(object).Attrs(ctx) returns uncached information? This discrepancy was one of the factors that caused confusion and led to a bug in our code. I will test out JSONRead today and provide an update.
@wjkoh there are a couple possible reasons:
ObjectHandle.Attrs()uses the JSON API (as do all other library methods aside from downloads using XML). So it's a different endpoint.- The Attrs call is a metadata request rather than a data request (no
alt=media). I'm not sure how that behaves w.r.t. cache control and would need to test it out to understand better.
Overall if you're concerned about consistency between those two methods I think it would be a good idea to switch to using JSON downloads. We are planning on switching that to be the library default in some future release.
@tritone @wjkoh
I have confirmed the WithJSONReads option solves this issue.
Thank you for your input, @tritone and @hkawa608 ! I have decided to abandon the idea of using Cloud Storage for real-time data sharing among our nodes after coming across this issue. Unfortunately, I did not follow up on it. At this point, I do not have a vested interest in this matter. What would be the most suitable solution until the "switch" occurs? I believe simply including a warning in the documentation would suffice.