cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Update GRPC

Open alanprot opened this issue 7 months ago • 1 comments

What this PR does:

This PR attempts to update the Go-GRPC library, which has been pinned for a while. The newer version of Go-GRPC reuses unmarshalling wire buffers internally, which can potentially race with our own buffer reuse logic, leading to message corruption.

A test was added to demonstrate this issue in the Push and QueryStream methods.

Problem background:

In Cortex, we use yolostring when unmarshalling bytes from the wire into proto structs (see: Cortex yolostring). Yolostring creates a string from the wire buffer’s bytes without copying them, effectively reusing the original bytes to reduce memory allocations.

Prior to Go-GRPC v1.65.0, wire buffers were created and used within a single request, so reusing those bytes via yolostring was safe.

However, starting with Go-GRPC v1.66.0, the wire buffers are now pooled and reused across multiple requests. The bytes are returned to the pool immediately after the Unmarshal method is called (reference and https://github.com/grpc/grpc-go/issues/6619). This means that the bytes might be overwritten before the Cortex code finishes using them, leading to potential message corruption.

In this PR, we are creating a new Proto Codec that is not freeing the buffer used to unmarshall the message back to the pool: See this in comparison to the go-grpc code here - in practical terms, we are by default falling back to the behavior before v1.66.0. However, we can also now change the proto message to implement the ReleasableMessage interface and, if done, the message will now have an extra Free() that can be called to release the buffer back to the pool when is safe to do so -> so far, only the WriteRequest message is implementing this as this is the one that has a huge TPS with a very short lived lifecycle. In the future, we can add this capabilities to other proto messages.

Which issue(s) this PR fixes: Fixes #

Checklist

  • [X] Tests updated
  • [NA] Documentation added
  • [NA] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

alanprot avatar Jun 10 '25 17:06 alanprot

LGTM

SungJin1212 avatar Jun 12 '25 01:06 SungJin1212

This is being running for some time internally and seems that its good. Merging it!

alanprot avatar Jun 26 '25 16:06 alanprot