bazel-buildfarm icon indicating copy to clipboard operation
bazel-buildfarm copied to clipboard

CAS set to grpc will cause java exception

Open hxndg opened this issue 3 years ago • 3 comments

I'm testing buildfarm 2.0.0-beta1 with another issue, we set cas type as grpc, set target as grpc bazel cache endpoint. It report a new exception as class build.buildfarm.cas.GrpcCAS cannot be cast to class build.buildfarm.cas.cfc.CASFileCache, If I only want to use grpc endpoint as cas, should I return backed to buildfarm 1.17.0?

config example

worker:
...
  realInputDirectories:
    - "external"
  capabilities:
    cas: true
    execution: true
  cas:
    type: GRPC
    target: "{$target}"

exception log

Oct 26, 2022 1:48:10 PM build.buildfarm.worker.shard.Worker main
SEVERE: exception caught
java.lang.ClassCastException: class build.buildfarm.cas.GrpcCAS cannot be cast to class build.buildfarm.cas.cfc.CASFileCache (build.buildfarm.cas.GrpcCAS and build.buildfarm.cas.cfc.CASFileCache are in unnamed module of loader 'app')
	at build.buildfarm.worker.shard.WorkerProfileService.<init>(WorkerProfileService.java:51)
	at build.buildfarm.worker.shard.Worker.createServer(Worker.java:489)
	at build.buildfarm.worker.shard.Worker.<init>(Worker.java:453)
	at build.buildfarm.worker.shard.Worker.startWorker(Worker.java:993)
	at build.buildfarm.worker.shard.Worker.main(Worker.java:958)

hxndg avatar Oct 26 '22 13:10 hxndg

Workers must have a FILESYSTEM type cas as its first element. Otherwise it will not be able to execute, or have function as a shard member

werkt avatar Oct 26 '22 16:10 werkt

Workers must have a FILESYSTEM type cas as its first element. Otherwise it will not be able to execute, or have function as a shard member

Okey then, but if I set FILESYSTEM type cas as its first element, grpc cache as second element, then how often will it write it's generated cas file to the grpc bazel cache endpoint?

We wanted to test whether buildfarm will recover itself if some of it's worker go down, in the case even some worker go offline, buildfarm could proceed to query cascache from grpc bazel cache endpoint.

hxndg avatar Oct 26 '22 17:10 hxndg

Currently, the only replication to a delegate CAS for FILESYSTEM is on expiration (think waterfall). Once a worker CAS is filled, in order to add new blobs, we spill over into the delegate. The hottest entries in the FILESYSTEM CAS remain resident on it through an LRU, while those that decay out eventually move to colder delegate storage, where they can be read through (putting them back at the front of the LRU) if needed. This tier is generally not a solution for fault tolerance, but instead is for expanded storage. This is currently the only strategy available for populating a non-worker CAS. If you have any suggestions for others, feel free to describe them.

Buildfarm's model is eventual consistency, with expected behaviors from a client that trigger re-executions in case of worker disappearance.

werkt avatar Nov 02 '22 04:11 werkt