bazel-remote
bazel-remote copied to clipboard
question about run 2 or multiple remote cache instances
Hi this is a question instead of a bug fix. I know for remote-cache request, it will first as AC and then CAS. But if we only have local disk store and we have 2 bazel-remote instance, it could cause inconsistency issue. Therefore have 2 questions.
- If both instance backed by one s3 storage, will this problem gone? The process is if local we don't have, we first ask s3 and put content back to disk, and then this problem will gone. Does our code logic like that?
- Could we just store everything in s3 instead of have a local storage, could we have that feature?
Thanks!
Hi this is a question instead of a bug fix. I know for remote-cache request, it will first as AC and then CAS. But if we only have local disk store and we have 2 bazel-remote instance, it could cause inconsistency issue. Therefore have 2 questions.
- If both instance backed by one s3 storage, will this problem gone? The process is if local we don't have, we first ask s3 and put content back to disk, and then this problem will gone. Does our code logic like that?
While S3 is strongly cache consistent from late 2020[*], using two bazel-remote instances with the S3 proxy backend does not provide strong cache consistency if a client switches between the two bazel-remote instances in a single build, because bazel-remote stores blobs locally and uploads to S3 asynchronously.
[*] https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/
Why do you need two bazel-remote instances?
- Could we just store everything in s3 instead of have a local storage, could we have that feature?
That would be possible with some refactoring, but I suspect there would be performance hit. Also, I don't know how to implemement LRU-like cache size limiting with S3 so I would feel bad recommending this configuration.
Thanks for reply, the reason for 2 instances is have higher availability, we could managed it by k8s etc. Yes, implement LRU for S3 are limited. But sometime higher availability is more important. For nginx remote cache solution, it usually backed by one s3. So I think for us have that feature (only upload to S3 and not use local disk) would also not harm. Or another solution would be turn on one flag which we synchronously upload and download from s3. So we could have multiple bazel-remote instances. Just share some dummy thought here. What do you think 🙏
I don’t use or know much about S3.
But In general, I think it would be easier to achieve simple and reliable failover to secondary cache instance, if the bazel client was better at retrying after issues. More specifically:
-
For Builds-Without-The-Bytes: Bazel client needs to do action rewinding or automatically restart the whole build. See https://github.com/bazelbuild/bazel/issues/10880
-
For remote execution: I think bazel client should be better at re-trying upload and running of actions via --remote_retries. Missing today is way for bazel client to switch to secondary cache instance, and discard information about what is already available in remote CAS.
-
For other scenarios: I guess bazel client already handles remote cache inconsistencies similar as cache misses, and maybe that is good enough.
I run something which looks similar to what you want:
We have multiple bazel-remote instances (currently 6) with identical config inside a kubernetes cluster. Each has its own local disk and a shared s3 bucket.
These are fronted by a couple of nginx instances managed by ingress-nginx. Using nginx.ingress.kubernetes.io/upstream-hash-by: "$request_uri" makes requests for the same blob generally hit the same bazel-remote replica each time, except in the cases where we've recently increased/decreased the number of replicas, or we're in the process of deploying a new version.
Depending on how long items stored in the cache are relevant to your clients, the extra s3 backend might not be needed for your use case.
@kragniz: if you'd be willing to share some example configurations, I'd love to add it to an examples doc or directory here.
Thanks @kragniz for sharing you idea! Also interested in you example! So if same blob hit same remote-cache and if that instance is done, we will still not have HA in this case. And when you increase/decrease the number, what you guys did? And @mostynb just like kragniz's case, multiple instance could also speed up throughput etc, I heard currently S3 could handle large throughput. So if we could have feature just not use local disk and could directly use s3 or we could have ask S3 synchronous, we could have some test to see the real performance.
@mostynb sure, I'll create a PR at some point
@BoyangTian-Robinhood requests will always get sent to a ready instance (there's a readiness probe configured to make sure instances are correctly returning the empty CAS blob), so requests will get redirected to one of the other instances in that case. It is likely that most the objects have already been uploaded to s3, so that second instance will look it up and get a cache hit (with some extra latency). This generally means all requests will get a response, but the cache hit rate and latency will both be slightly worse while restarting/scaling instances.
Hi @kragniz thanks for you detail explain! Sorry still one part is not clear to me. For example, first AC request to bazel-remote-instance-1 which has both AC and CAS stored on it local disk. Then it will reply cache exist to client. Then bazel-remote-instance-1 dead. Then we got CAS request, then readiness probe will return something like 404, in this case since it not return no CAS, there is no bazel error right? But if at the CAS request comes, the new started instance-2 is already started by k8s, in this case we will still get the same bazel error right? (If I understand correctly, this instance-2 is just started by k8s because k8s detect instance-1 dead). Or since all disk are mounted, so new instance start still use the same mounted disk location, so it still has instance-1 disk store?
Another question is local disk is mounted right? k8s managed docker disk size has upper limit, so it is not a deployment type with 6 replica right? It is just 6 component right? or 6 deployment each has 1 replica?
@BoyangTian-Robinhood: are you still looking for help with this?