bazel-remote icon indicating copy to clipboard operation
bazel-remote copied to clipboard

how to do a production deployment for a high IO situation

Open danni-yan opened this issue 1 year ago • 11 comments

If I want to do a production deployment, what suggestions do you have?

The configurations I can find are all for single-instance deployment. I believe that when the size of stored data is large and the requests are high, I/O could become a bottleneck.

So do you have any recommended deployment methods similar to sharding?

I haven't read the code yet. If this is a very basic question, please don't hesitate to answer it. Thx.

danni-yan avatar Oct 16 '24 12:10 danni-yan

For a large deployment, I would recommend:

  1. Configure both clients and server to use zstandard compression. This reduces both network and disk traffic. Use fast storage, SSD-like or better. See how far this scales before optimising further.

  2. Consider partitioning clients- run multiple instances with separate storage, if you hit disk IO bottlenecks and can't move to faster storage. You could for example configure clients that build different things to use separate bazel-remote instances, eg a cache server just for linux clients, another one for mac clients and so on. Or a cache server for trunk, another one for release branches etc. Or location based partitioning. The important requirement here is that clients should not switch between partitions during a single build.

  3. Consider tiered servers, eg use one central bazel-remote server with a lot of storage, and have clients connect to multiple smaller bazel-remote instances which are configured to use the central bazel-remote server as a HTTP or gRPC backend proxy. As in (2), it is important that clients do not switch between cache servers during a single build.

mostynb avatar Oct 16 '24 21:10 mostynb

Really appreciate your quick response.

So, for high I/O situations, bazel-remote needs to setup multiple instances on the server side and configure clients use different instances. The concept of tiered servers is essentially the same idea.

I'm wondering do you have plans to implement sharding on the server side. This way, users would have a unified access point, and when expansion is needed, changes could be made only on the server side, making it more convenient to use.

danni-yan avatar Oct 17 '24 09:10 danni-yan

https://github.com/buildbarn/bb-storage is written as both a routing and storage tool. https://github.com/buildbarn/bb-deployments/ has examples where it is used as a sharding frontend and you can choose to use Bazel-remote as the storage backend. There are deployments of Buildbarn reaching PB of storage.

moroten avatar Oct 17 '24 10:10 moroten

Hey, I'm looking at setting up something similar for a fairly large environment at the moment.

I was wondering if there is any opinions on doing the setup described here?

Specifically:

  • deploying n shards of bazel-remote
  • pointing all of them to a shared blob s3-compliant storage
  • routing to a specific shard through nginx directives on the basis of the incoming request

I wonder how the above works in terms of @mostynb remark in:

The important requirement here is that clients should not switch between partitions during a single build.

I guess if we accept some downtime during scaling in or out then technically this is a feasible option?


I also wonder how that compares to what @moroten suggested with bb-storage above. Would appreciate any input here.

fiffeek avatar Oct 17 '24 11:10 fiffeek

I also wonder how that compares to what @moroten suggested with bb-storage above. Would appreciate any input here.

Instead of using nginx for routing, bb-storage frontend would be used. As bb-storage is protocol aware it can perform existence checking on GetActionResult against all the storages, depending on where the output blobs are expected to be located.

moroten avatar Oct 17 '24 12:10 moroten

I'm not sure if it's appropriate to continue discussing bb-storage here, but I still want to ask more questions.

According to what you said, the bb-storage frontend will sequentially(or randomly) pass the GetActionResult request to the backend cache service after receiving the request, until it can retrieve the cache. Is that correct?

how about the UpdateActionResult, will the traffic to the backend instances be balanced?

danni-yan avatar Oct 17 '24 13:10 danni-yan

The sharding in bb-storage is done by digest, so bb-storage will only do read/write/find-missing calls to that single backend. Think of N nodes, then a blob with digest D is assigned to backend B=hash(D) mod N (but weighted if your shards have different sizes).

There are also configurations for splitting by instance name and mirroring, but this thread is mostly about sharding.

moroten avatar Oct 17 '24 13:10 moroten

It's OK with me to discuss these kinds of setups here, but the bb-deployments repository might be a better place to look for examples. If there is a good example to point to, we can add a link in bazel-remote's README.md file.

I know of a few fairly large (but maybe not petabyte) cache server setups using a combination of bb-storage and bazel-remote, with bb-storage handling the sharding and disabling ActionResult validation in bazel-remote (because the bazel-remote instances don't know how to check if blobs exist on the other bazel-remote instances). IIRC bb-storage uses the prefix of SHA256 digests to define the sharding, and that gives some basic load-balancing between the shards (unless the client picks ActionCache keys that all share similar prefixes).

I don't recommend using S3 blob storage, because I do not know if it is possible to implement LRU-like cache eviction in this setup, and other forms of cache eviction like basic TTL setups will cause build failures.

And as @moroten mentioned- you should use an REAPI-aware frontend, not something generic like nginx.

mostynb avatar Oct 17 '24 18:10 mostynb

Okay, thanks a ton @mostynb.

I do have a follow up question on:

I don't recommend using S3 blob storage, because I do not know if it is possible to implement LRU-like cache eviction in this setup, and other forms of cache eviction like basic TTL setups will cause build failures.

I went through the code and I believe the current state is that bazel-remote with S3 would essentially copy the artifacts from S3 to a local storage, then it ensures that the local storage follows LRU and it does evict files from that storage -- however no eviction whatsoever is done from the S3 storage. Is this understanding correct?


For:

you should use an REAPI-aware frontend, not something generic like nginx.

Is there any existing config like this but with bazel-remote that any one of you (@mostynb , @moroten ) would be able to share?

I dug up the protos for the config so I can reverse engineer this but an example would be awesome.


EDIT: After re-reading:

If there is a good example to point to, we can add a link in bazel-remote's README.md file.

I reckon there is no example unfortunately, will work on that then...

fiffeek avatar Oct 17 '24 19:10 fiffeek

I went through the code and I believe the current state is that bazel-remote with S3 would essentially copy the artifacts from S3 to a local storage, then it ensures that the local storage follows LRU and it does evict files from that storage -- however no eviction whatsoever is done from the S3 storage. Is this understanding correct?

Correct. Bazel-remote always uses a local disk cache, which it tries to keep disk usage under the --max_size setting using LRU eviction. It also optionally reads and writes blobs to proxy backends (s3, gcs, etc) but performs no cache eviction for the proxy backends. PRs to add this feature for individual proxy backend implementations would be welcome (please open an issue for discussion first).

Is there any existing config like this but with bazel-remote that any one of you (@mostynb , @moroten ) would be able to share?

I don't have one handy, but I think you should specify 'grpc' backends instead of 'local' as used in the example you linked to.

mostynb avatar Oct 17 '24 22:10 mostynb

The backends are specified in common.libsonnet which in your case should be Bazel-remote.

moroten avatar Oct 18 '24 09:10 moroten

Since there is no further discussion, I'll close this issue.

Thank you all for your replies!

danni-yan avatar Oct 21 '24 09:10 danni-yan

Correct. Bazel-remote always uses a local disk cache, which it tries to keep disk usage under the --max_size setting using LRU eviction. It also optionally reads and writes blobs to proxy backends (s3, gcs, etc) but performs no cache eviction for the proxy backends. PRs to add this feature for individual proxy backend implementations would be welcome (please open an issue for discussion first).

I may have some cycles to burn on exploring LRU for the S3 backend sometime in the not to distant future. I think the main issue is the need to scan the S3 bucket (which will be quite expensive for large volumes of objects). am I otherwise missing something here?

kellyma2 avatar Oct 25 '24 04:10 kellyma2

I may have some cycles to burn on exploring LRU for the S3 backend sometime in the not to distant future. I think the main issue is the need to scan the S3 bucket (which will be quite expensive for large volumes of objects). am I otherwise missing something here?

Yes- I'm not sure if there's a good way to do that with S3.

Before they shut down, Zenly had a cache that had a TTL based solution for GCS, which might be interesting to copy into bazel-remote (behind a configuration option). IIUC whenever touching a blob on GCS it would also update some metadata that was used by some GCS-side "object lifecycle management". Maybe there's something similar for S3?

I have an archive of their repository here: https://github.com/mostynb/znly-bazel-cache

mostynb avatar Oct 25 '24 20:10 mostynb

Interesting. My use case is slightly different - I'm not using S3 itself, I'm using an "S3 compatible" solution on prem attached to Kubernetes. I have to do some experiments from a performance standpoint to see if we can get away with just the S3 backing store and no disk cache (understanding that this is not something that works today)

kellyma2 avatar Oct 25 '24 20:10 kellyma2

Another point to consider is compression — it is not supported by the bb-storage front-end, but it is supported by bazel-remote. An interesting question arises: in this setup, will the bb-storage front-end correctly pass compressed requests to bazel-remote or not?

develar avatar Nov 29 '24 06:11 develar

IIUC bb-storage will only make regular/uncompressed data requests, so any communication with bazel-remote would not be compressed. If bazel-remote was using the (default) compressed storage mode then bazel-remote would need to decompress the data from disk in every request before sending uncompressed data to back to bb-storage.

mostynb avatar Nov 29 '24 15:11 mostynb