storage: panic: slice bounds out of range in gRPCWriter.uploadBuffer
Client
Storage gRPC Client
Description
While uploading a file using the cloud.google.com/go/storage SDK, a runtime error: slice bounds out of range panic occurred within the gRPCWriter.uploadBuffer function. This panic originates from a goroutine created internally by the SDK, making it difficult for the application level to recover.
Error Log
panic: runtime error: slice bounds out of range [:-199229440]
goroutine 29284526 [running]:
cloud.google.com/go/storage.(*gRPCWriter).uploadBuffer(0xc1e9b00240, 0x856f5a, 0xc000000, 0x1)
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_client.go:2123 +0xbcd
cloud.google.com/go/storage.(*grpcStorageClient).OpenWriter.func1()
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_client.go:1223 +0x130
created by cloud.google.com/go/storage.(*grpcStorageClient).OpenWriter in goroutine 150
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_client.go:1185 +0x42e
Steps to Reproduce
The exact steps to reproduce are difficult to pinpoint. This error has not occurred in the last 6 months, and our workload averages 450MB/s uploads per day. The issue might be related to large file uploads or unstable network conditions.
Potential Problem Area and Hypothesis
According to the error log, the panic occurred at line 2123 in the cloud.google.com/go/[email protected]/grpc_client.go file:
// ...
// Prepare chunk section for upload.
data := toWrite[sent : sent+bytesToSendInCurrReq] // grpc_client.go:2123
// ...
It appears that the index sent : sent+bytesToSendInCurrReq for the toWrite slice is either negative or out of bounds ([:-199229440]). This could be due to abnormal values in the bytesToSendInCurrReq or sent variables.
func (w *gRPCWriter) uploadBuffer(recvd int, start int64, doneReading bool) (*storagepb.Object, int64, error) {
goroutine 29284526 [running]:
cloud.google.com/go/storage.(*gRPCWriter).uploadBuffer(0xc1e9b00240, 0x856f5a, 0xc000000, 0x1)
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_client.go:2123 +0xbcd
It seems like recvd was 0x856f5a = 8,744,794, start was 0xc000000 = 201,326,592(192MiB) and doneReading was true
Relevant code:
// ...
sendBytes: // label this loop so that we can use a continue statement from a nested block
for {
bytesNotYetSent := recvd - sent
remainingDataFitsInSingleReq := bytesNotYetSent <= maxPerMessageWriteSize
if remainingDataFitsInSingleReq && doneReading {
lastWriteOfEntireObject = true
}
// Send the maximum amount of bytes we can, unless we don't have that many.
bytesToSendInCurrReq := maxPerMessageWriteSize
if remainingDataFitsInSingleReq {
bytesToSendInCurrReq = bytesNotYetSent
}
// Prepare chunk section for upload.
data := toWrite[sent : sent+bytesToSendInCurrReq] // panic occurred here
// ...
Hypothesis:
- The
recvd(received bytes) orsent(sent bytes) values might have been miscalculated for some reason, causingbytesNotYetSentto become negative. Consequently,bytesToSendInCurrReqcould also become negative, leading to a panic when accessing the slice. - The
sentvalue is calculated aswriteOffset - start. ThewriteOffsetis updated within thedetermineOffsetfunction viaqueryProgress. During this process,writeOffsetmight be incorrectly set to a value greater thanstart + recvd. This would causesentto exceedrecvd, eventually makingbytesNotYetSentandbytesToSendInCurrReqnegative.
Regarding Panic Recovery
As seen in lines 1183-1185 of grpc_client.go, the SDK creates its own goroutine for the write operation:
// ...
// This function reads the data sent to the pipe and sends sets of messages
// on the gRPC client-stream as the buffer is filled.
go func() { // grpc_client.go:1185
defer close(params.donec)
// ...
This internal goroutine makes it impossible for the package caller to wrap the call in a recover block to handle such panics. Is there any recommended way to recover from this type of panic when it originates from within the SDK's internally managed goroutine?
Environment Information
- Docker (multi stage) on AWS EKS
- build image:
golang:1.23-bookworm - runtime image:
gcr.io/distroless/base-debian12
- build image:
- Go version: 1.23
go.modgoogle.golang.org/api v0.210.0google.golang.org/grpc v1.67.1
Hi @winterjung, thank you for the detailed issue!
We did a major refactoring of this code since 1.47.0, including handling some edge cases on retries. I would suggest updating to the latest release of cloud.google.com/go/storage and seeing if that resolves the issue.
I don't believe there is a way to recover from this type of panic - it really shouldn't be happening at all and we have not encountered this before.
Do you have an idea of the size of the object for which you got this issue? How often were you seeing this issue?
@BrennaEpp Thanks for responding. We'll update the google-cloud-go SDK to the latest version and monitor the outcome. As mentioned earlier, this is the first time we've encountered this issue in over 6 months of running in production, so it may take a long time before we can be confident that the root cause has been addressed.
We're using the gRPC client for Google Cloud Storage with a chunk size of 64 MiB. We close the object when its uncompressed size exceeds 1 GiB.
e.g.
// ...
var cli *storage.Client // initiated from main.go
wc := cli.Bucket(bucket).Object(objName).NewWriter(ctx)
wc.ChunkSize = 64 * 1024 * 1024 // 64MiB
// called in another goroutine
if writtenSize > 1 * 1024 * 1024 * 1024 { // 1GiB
wc.Close()
}
Hi @winterjung, once again, thanks for opening the issue. I am closing this as not reproducible. If you encounter this again don't hesitate to re-open or open a new issue.
Hello @BrennaEpp
After updating the SDK to Go 1.25.3 and [email protected], we started seeing intermittent panic errors occurring inside the SDK itself. These panics cannot be recovered from within our application.
Has this issue been reported before, or are there any known workarounds or fixes?
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x138beca]
goroutine 10308988 [running]:
cloud.google.com/go/storage.(*gRPCResumableBidiWriteBufferSender).sendBuffer(0x0, {0x26456f8?, 0xc50263cd80?}, {0xc53b7e0000?, 0xc000341740?, 0x261a200?}, 0xc000341740?, 0x0, 0x0)
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_writer.go:514 +0x4a
cloud.google.com/go/storage.(*gRPCWriter).uploadBuffer(0xc445ba45b0, {0x26456f8, 0xc50263cd80}, 0xc2fae0d500?, 0x24?, 0x0)
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_writer.go:626 +0x209
cloud.google.com/go/storage.(*grpcStorageClient).OpenWriter.func1.(*grpcStorageClient).OpenWriter.func1.1.2({0x26456f8?, 0xc50263cd80?})
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_writer.go:185 +0x50
cloud.google.com/go/storage.run.func1()
/go/pkg/mod/cloud.google.com/go/[email protected]/invoke.go:104 +0x1f0
cloud.google.com/go/internal.retry({0x26456f8, 0xc398bda4b0}, {0x3b9aca00, 0x6fc23ac00, 0x4000000000000000, 0x77359400}, 0xc0a0a35e78, 0x23c92f8)
/go/pkg/mod/cloud.google.com/[email protected]/internal/retry.go:39 +0x74
cloud.google.com/go/internal.Retry(...)
/go/pkg/mod/cloud.google.com/[email protected]/internal/retry.go:32
cloud.google.com/go/storage.run({0x26456f8, 0xc398bda4b0}, 0xc12caa5f70, 0xc445e02870, 0x0)
/go/pkg/mod/cloud.google.com/go/[email protected]/invoke.go:91 +0x317
cloud.google.com/go/storage.(*grpcStorageClient).OpenWriter.func1.1(...)
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_writer.go:200
cloud.google.com/go/storage.(*grpcStorageClient).OpenWriter.func1()
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_writer.go:220 +0x1f6
created by cloud.google.com/go/storage.(*grpcStorageClient).OpenWriter in goroutine 10309099
/go/pkg/mod/cloud.google.com/go/[email protected]/grpc_writer.go:168 +0x4ff
This occurred in the same environment as previously reported issues, with the following dependencies:
cloud.google.com/go/storage v1.56.3google.golang.org/api v0.252.0google.golang.org/grpc v1.76.0google.golang.org/protobuf v1.36.10
Please let me know if you need additional information — I’ll be happy to provide more details.
Thank you.
My read of that stack trace is that w.streamSender is a typed nil for the interface gRPCBidiWriteBufferSender, with type *gRPCResumableBidiWriteBufferSender.
I think it happens when we return an error here: https://github.com/googleapis/google-cloud-go/blob/storage/v1.56.3/storage/grpc_writer.go#L475. In that case, we assign a typed nil to the stream sender interface here: https://github.com/googleapis/google-cloud-go/blob/storage/v1.56.3/storage/grpc_writer.go#L605 and we will skip this initialization on the next run through uploadBuffer.
This can only happen for resumable uploads, since the other buffer sender init functions cannot return an error.
I will check if this is still present in the v1.57.1 refactor.
This specific issue is fixed in v1.57.1 because https://github.com/googleapis/google-cloud-go/blob/storage/v1.57.1/storage/grpc_writer.go#L210 cannot set the stream sender interface to nil.
I think the patch to fix v1.56.3 is relatively straightforward and probably worthwhile. The issue has been present since https://github.com/googleapis/google-cloud-go/commit/b4d86a52bd319a602115cdb710a743c71494a88b. The prior iteration of the code didn't have an abstraction which encapsulated oneshot vs. resumable uploads, so it didn't have this precise issue.
https://github.com/googleapis/google-cloud-go/pull/13278 would fix, I think. @BrennaEpp