remote-apis CAS: Existence Caching in Intermediate Caches (user experience report)

I work for Cruise. I was talking to our Google Account Manager JP Guerra a while back about this issue and he thought it'd be useful to share our experience with upstream. I liked the idea, so here I am. 😁

We built an in-house RBE service. We diverged from the "ContentAddressableStorage" service API to support existence caching in intermediate caches. I wanted to share what we did and why we did it. We prefer to not diverge from the upstream API but we needed to in this case.

On our internal CAS we register an additional gRPC service called "CruiseContentAddressableStorage". It has a method that is the inverse of the "ContentAddressableStorage" service "FindMissingBlobs" method. Instead of finding blobs which do not exists, the "FindBlobs" method finds blobs which do exist, so that we can return metadata associated with each object. Specifically, we return a timestamp called "expires_at" which is the wall clock time of how long intermediate existence caches may record that an object exists (we never cache non-existence, because that'd cause inconsistency). This enables intermediate CAS (which proxy for another CAS) to cache existence.

service CruiseContentAddressableStorage {
  rpc FindBlobs(FindBlobsRequest) returns (FindBlobsResponse) {}
}

message FindBlobsRequest {
  string instance_name = 1;
  repeated Digest blob_digests = 2;
}

message FindBlobsResponse {
  repeated FoundBlobMetadata found_digest_metadata = 1;
}

message FoundBlobMetadata {
  Digest digest = 1;
  google.protobuf.Timestamp expires_at = 2;
}

We had thought of using gRPC metadata for the "expire_at" timestamps, but we have batches of around 15,000 digests (each one has its own timestamp) which would have been a challenge to pack into gRPC metadata (due to HTTP2 header size limits). So we registered the additional "CruiseContentAddressableStorage" service, and we dynamically fallback to doing no intermediate existence caching if the server returns the gRPC code "Unimplemented" (meaning the service or method doesn't exist). So we're still compatible with upstream with this fallback.

The underlying reason why we're doing existence caching in intermediate caches is because our underlying database Spanner cannot handle the required read rate, or write rate for updating atime (access time) used for expiration. Our CAS which talks to Spanner has a memory existence cache which also cannot scale high enough so we need to propagate the fact that blobs exist to cache levels that are closer to bazel. We also have to jitter atime to avoid bursts of writes on Spanner which necessitates that our "expire_at" times be different for every object to spread atime update load on Spanner over time.

I think the key idea is that the API is missing information needed to do existence caching in intermediate caches.

May 12 '23 18:05 sbunce

Hi Seth,

Thanks for sharing this with the group! I think your workaround is reasonable, and is actually more-or-less what's intended by the design of the CAS API. We intentionally don't expose an 'expiration time' on blobs because doing so would restrict possible implementations that might meet different constraints. For instance, some implementations might choose to implement an upper bound on total CAS size with an eviction algorithm when it hits that limit, others might use a static TTL, and still others might use a TTL with lifetime extension. Thus, it's up to the various implementations to define their strategy and implement the right pieces to make it work (and to respect the sometimes annoyingly imprecise limits like this one https://github.com/bazelbuild/remote-apis/blob/main/build/bazel/remote/execution/v2/remote_execution.proto#LL152C24-L152C24 on cache references).

In your case, you've found a way to expose some implementation-specific metadata for blobs that helps you build intermediate caches. It does mean that your intermediate cache service is only truly compatible with your own implementation (ignoring the fallback behavior on "unimplemented") but that seems correct to me because the cache's behavior really depends on the specifics of the CAS eviction method in the server. It also has the potential disadvantage that your intermediate caches "hide" the accesses from the logic in your main server, so TTL extension might not work properly.

A reasonable alternative would be to give blobs in the cache a "short" lifetime and then refetch them as needed. You could also do extend-on-touch in the cache layer, with a call to FindMissingBlobs on the main server when extending the cache lifetime to verify that the object is still present. Generally RE implementations will make some effort to guarantee that a blob referenced by a call to FindMissingBlobs will live for a reasonable (but unspecified) period of time after that call.

Thanks, Steven

On Fri, May 12, 2023 at 2:26 PM Seth Bunce @.***> wrote:

I work for Cruise https://getcruise.com. I was talking to our Google Account Manager JP Guerra a while back about this issue and he thought it'd be useful to share our experience with upstream. I liked the idea, so here I am. 😁

We built an in-house RBE service. We diverged from the "ContentAddressableStorage" service API to support existence caching in intermediate caches. I wanted to share what we did and why we did it. We prefer to not diverge from the upstream API but we needed to in this case.

On our internal CAS we register an additional gRPC service called "CruiseContentAddressableStorage". It has a method that is the inverse of the "ContentAddressableStorage" service "FindMissingBlobs" method. Instead of finding blobs which do not exists, the "FindBlobs" method finds blobs which do exist, so that we can return metadata associated with each object. Specifically, we return a timestamp called "expires_at" which is the wall clock time of how long intermediate existence caches may record that an object exists (we never cache non-existence, because that'd cause inconsistency). This enables intermediate CAS (which proxy for another CAS) to cache existence.

service CruiseContentAddressableStorage { rpc FindBlobs(FindBlobsRequest) returns (FindBlobsResponse) {} }

message FindBlobsRequest { string instance_name = 1; repeated Digest blob_digests = 2; }

message FindBlobsResponse { repeated FoundBlobMetadata found_digest_metadata = 1; }

message FoundBlobMetadata { Digest digest = 1; google.protobuf.Timestamp expires_at = 2; }

We had thought of using gRPC metadata for the "expire_at" timestamps, but we have batches of around 15,000 digests (each one has its own timestamp) which would have been a challenge to pack into gRPC metadata (due to HTTP2 header size limits). So we registered the additional "CruiseContentAddressableStorage" service, and we dynamically fallback to doing no intermediate existence caching if the server returns the gRPC code "Unimplemented" (meaning the service or method doesn't exist). So we're still compatible with upstream with this fallback.

The underlying reason why we're doing existence caching in intermediate caches is because our underlying database Spanner cannot handle the required read rate, or write rate for updating atime (access time) used for expiration. Our CAS which talks to Spanner has a memory existence cache which also cannot scale high enough so we need to propagate the fact that blobs exist to cache levels that are closer to bazel. We also have to jitter atime to avoid bursts of writes on Spanner which necessitates that our "expire_at" times be different for every object to spread atime update load on Spanner over time.

I think the key idea is that the API is missing information needed to do existence caching in intermediate caches.

— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/remote-apis/issues/252, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMU237KWGVM2UKOSKSUHWLXFZ6DRANCNFSM6AAAAAAX73QUOY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

May 12 '23 20:05 bergsieker

Hey Steven,

You're welcome. The API is well designed. I had several "ah hah" moments during implementation when I realized why the API was designed in a specific way.

The subtle part of what we're doing is that we're not returning how long the object will exist, we're returning how long existence of the object may be cached (which is <= how long the object exists). We're doing this such that existence checks reach the layer capable of bumping atime in a jittered way to avoid overloading our database. I think that's a generic piece that doesn't constrain expiration implementations, but it may be too niche for anyone to want. People can always do what we're doing if they have this situation if they control everything. But if they don't control everything there'd be no way to do intermediate existence caching without introducing inconsistency.

But anyways, I just wanted to share this little thing we ran into in case it was useful to someone. You can feel free to mark the issue appropriately and close it.

Seth

May 12 '23 23:05 sbunce