bazel-remote
bazel-remote copied to clipboard
Docs about scaling?
I couldn't find any docs related to scalability of bazel-remote, so I'm trying to get some Infos here.
Generally, is it possible to scale bazel-remote horizontally? Lets say you start small and there is just one bazel-remote client. After some time more bazel-remote clients are added (through local development and pipeline runs). Lets say that there are 20 clients, which might want to access bazel-remote at the same time.
Now bazel-remote gets really slow and is not able to respond in time etc.. Could you in this case just add another bazel-remote, which would point to the same disk (e.g. Amazon EFS), without any issues? If no, what would be potential issues here? Would these issues be major or minor?
Bazel-remote's cache dir can't be shared between multiple bazel-remote processes running at the same time. If you try to do this, then each instance's disk usage estimate will drift once it hits the cache size limit (maybe not a problem for amazon EFS, but you could run out of disk space on a traditional filesystem), and some clients might get build errors when bazel-remote reports that it has particular cache items available that have been removed by another instance.
It is possible for multiple bazel-remote instances to use the same proxy backend (eg s3 or gcs, or another bazel-remote instance via http). I have used this method to setup office local caches which fall back to checking a central cache. The same idea would also spread load across multiple bazel-remote instances on the same network if you wanted to experiment. But I would recommend just starting with a single cache and seeing if that scales up enough for you.
it would be really helpful to mention this somewhere in the README