bazel-buildfarm
bazel-buildfarm copied to clipboard
Hex bucketing does not work in the production level environment
While we had success with enabling hex bucketing in dev and stage, the container crashes when I attempted to enable it in production under moderate loads.
hex_bucket_levels: 1
[INFO ] build.buildfarm.worker.shard.Worker <init> - buildfarm-worker-10.35.222.15:8981-d655c277-29ae-4606-9621-806b8c5577d5 initialized
[INFO ] build.buildfarm.cas.cfc.CASFileCache start - Initializing cache at: /var/buildfarm/worker/cache
[INFO ] build.buildfarm.cas.cfc.CASFileCache joinThreads - Scanning Cache Root...
[INFO ] build.buildfarm.cas.cfc.CASFileCache logCacheScanResults - {"keys":94457,"dirs":1051,"delete":49686}
[INFO ] build.buildfarm.cas.cfc.CASFileCache joinThreads - Populating Directories...
[INFO ] build.buildfarm.cas.cfc.CASFileCache logComputeDirectoriesResults - {"invalid dirs":0}
[INFO ] build.buildfarm.cas.cfc.CASFileCache start - Creating Index
[INFO ] build.buildfarm.cas.cfc.CASFileCache start - Index Created
[INFO ] build.buildfarm.cas.cfc.CASFileCache start - Startup Time: 9s
[INFO ] build.buildfarm.metrics.prometheus.PrometheusPublisher startHttpServer - Started Prometheus HTTP Server on port 9090
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/75f75c2c-9f84-42e4-84a5-504d3d62e136
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/5257bcc2-3a06-4cd3-9829-e4e008f03ee1
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/5e8fd818-8955-452a-9c8c-968a929bdf5a
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/ad71de7a-89c0-4cd2-96ca-a0f557fc75b9
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/15a29715-3f44-4fea-bc75-a71346363ea1
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/ddd2f02f-44c2-4798-9c46-f95711e35aba
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/1a4bf314-dfe4-4318-9edb-5c46aa01cf0d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/38517b9e-d03b-492f-ad8c-48bab12c424b
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/c837c7b7-5e39-4dc3-87ff-f9acb4fd0664
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/9802f9f7-d6f6-454a-9613-d96b1b168deb
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/fadb11dd-9d66-489b-ac02-17209c07e527
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/16caf333-3988-4537-9db5-d41f86d91b4d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/0eb949b4-6fab-48bd-9ebd-be0a15503027
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/bd278e33-fdf7-4163-8aa2-4dde25866c49
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/2933b14d-1222-4ac5-bd0f-6f655c6547c6
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/1262d023-a4fa-4d99-b5f5-7eb790418977
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/241e829a-f73f-40e8-8b9a-085627093f2f
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/e5ee6678-b561-4f1e-a575-a8518ab07bff
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/7d6bfbe9-5b33-4091-a882-6523dc0d8c89
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/ae5b9793-69e0-46c7-a9f5-e4127672f927
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/d5ccb622-bc32-4b70-94d7-c4a0740ada6b
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/46b70620-eb4b-48ce-a7d7-89fd54aeada0
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/740601c2-1cbb-4c22-a771-f036af984926
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/4c2dbd8e-a377-4de9-8382-a58b5ccf9320
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/bfba36ca-6840-4208-ac8b-c6d527bceb93
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/352b71cd-f99b-4661-b9ff-d71ecf7f4ffa
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/19be605c-3895-4d4f-8eef-d69b5ced16bc
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/0696315f-1713-4e09-8458-360e093c95d2
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/fa155540-a2c5-47e5-bc3d-411aef239958
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/d8c5b63a-7ddf-4730-aacc-2179fe964045
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/538709d8-c4ab-472d-8057-cd8c42137289
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/d8aa22b1-3eea-4475-9551-e551ef581319
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/3b84542a-e620-4a26-9fbf-87f294cf13da
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/72bb33f6-aada-4be4-91b1-831fffc82f14
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/7f127e0c-7f6e-47a9-9ad4-7d719093c56a
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/8ed1e62a-dfe0-4854-bc4d-1df00641f0e9
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/541bfc7b-3beb-4432-99db-883aac87be31
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/74e8770e-8236-47e1-9a3f-2e6730ae244a
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/95cd1be1-f3e4-4803-b205-dc634f4cefaa
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/c50c4168-b4b4-4dc9-ada6-469a1a965035
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/5881eb3d-1acd-416e-bbe3-0581014fe757
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/fde43662-f445-4d54-ac00-ba7b299c9862
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/bfb43ba0-ddf8-4ac4-b657-f66bccfeb968
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/d34ed40c-5d4d-4ea2-a5af-16e30138b05d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/4fe2f46e-1343-44a1-b4d7-57782c4ff962
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/3a5daaf5-b4eb-436b-a94d-5df1b871e4ca
[WARNING] build.buildfarm.cas.cfc.CASFileCache copyExternalInput - error downloading 8a5a830f61521bf482f3a286cb8531485f60895c/1692
java.nio.file.NoSuchFileException: 8a5a830f61521bf482f3a286cb8531485f60895c/1692
at build.buildfarm.instance.shard.RemoteInputStreamFactory$1.onFailure(RemoteInputStreamFactory.java:256)
at build.buildfarm.instance.shard.ShardInstance$WorkersCallback.onSuccess(ShardInstance.java:980)
at build.buildfarm.instance.shard.ShardInstance$WorkersCallback.onSuccess(ShardInstance.java:970)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1080)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:321)
at com.google.common.util.concurrent.MoreExecutors$5.execute(MoreExecutors.java:1108)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:47)
at build.buildfarm.instance.shard.Util$1.complete(Util.java:129)
at build.buildfarm.instance.shard.Util$1.onSuccess(Util.java:142)
at build.buildfarm.instance.shard.Util$1.onSuccess(Util.java:126)
at build.buildfarm.instance.shard.Util$2.onSuccess(Util.java:164)
at build.buildfarm.instance.shard.Util$2.onSuccess(Util.java:161)
at build.buildfarm.instance.shard.Util$3.onSuccess(Util.java:197)
at build.buildfarm.instance.shard.Util$3.onSuccess(Util.java:188)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1080)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at io.grpc.stub.ClientCalls$GrpcFuture.set(ClientCalls.java:558)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:531)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/161372a8-46ed-4102-84ae-767bd66a6485
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/ad1d3b40-28f1-4e3c-9ea0-fc47ad238aa3
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/9f92ae70-9c9b-46ad-befa-821d32a9c9dd
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/6578de7d-8d14-4413-bb7b-04da58a1457d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/b4b1dd06-3ef9-4ae9-afc0-c4aaf048a66d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/1c9a3034-86bb-4283-bb5f-4f81ad94486c
[SEVERE ] build.buildfarm.worker.PipelineStage run - MatchStage::run(): stage terminated due to exception
java.lang.NullPointerException: Cannot invoke "build.buildfarm.v1test.QueueEntry.getPlatform()" because "queueEntry" is null
at build.buildfarm.worker.DequeueMatchEvaluator.shouldKeepOperation(DequeueMatchEvaluator.java:57)
at build.buildfarm.worker.shard.ShardWorkerContext.matchInterruptible(ShardWorkerContext.java:315)
at build.buildfarm.worker.shard.ShardWorkerContext.match(ShardWorkerContext.java:378)
at build.buildfarm.worker.MatchStage.iterate(MatchStage.java:141)
at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:44)
at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:51)
at java.base/java.lang.Thread.run(Thread.java:832)
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - error creating exec dir for shard/operations/9a6f04f2-7f80-41a9-bf3e-21772fa03a6f
build.buildfarm.worker.shard.CFCExecFileSystem$ExecDirException: /var/buildfarm/worker/shard/operations/9a6f04f2-7f80-41a9-bf3e-21772fa03a6f: 1 exceptions
at build.buildfarm.worker.shard.CFCExecFileSystem.checkExecErrors(CFCExecFileSystem.java:307)
at build.buildfarm.worker.shard.CFCExecFileSystem.createExecDir(CFCExecFileSystem.java:371)
at build.buildfarm.worker.shard.ShardWorkerContext.createExecDir(ShardWorkerContext.java:721)
at build.buildfarm.worker.InputFetcher.fetchPolled(InputFetcher.java:179)
at build.buildfarm.worker.InputFetcher.runInterruptibly(InputFetcher.java:85)
at build.buildfarm.worker.InputFetcher.run(InputFetcher.java:269)
at java.base/java.lang.Thread.run(Thread.java:832)
Suppressed: build.buildfarm.cas.cfc.CASFileCache$PutDirectoryException: /var/buildfarm/worker/cache/66/663cb99dd9cb52b9f91e1438223ed291e703ec09_dir: 1 exceptions
at build.buildfarm.cas.cfc.CASFileCache.lambda$putDirectorySynchronized$19(CASFileCache.java:2177)
at com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture.doTransform(AbstractTransformFuture.java:213)
at com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture.doTransform(AbstractTransformFuture.java:202)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:118)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1429)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Suppressed: java.util.concurrent.ExecutionException: java.nio.file.NoSuchFileException: 8a5a830f61521bf482f3a286cb8531485f60895c/1692
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:566)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:527)
at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:88)
at build.buildfarm.cas.cfc.CASFileCache.lambda$putDirectorySynchronized$19(CASFileCache.java:2168)
... 9 more
Caused by: java.nio.file.NoSuchFileException: 8a5a830f61521bf482f3a286cb8531485f60895c/1692
at build.buildfarm.instance.shard.RemoteInputStreamFactory$1.onFailure(RemoteInputStreamFactory.java:256)
at build.buildfarm.instance.shard.ShardInstance$WorkersCallback.onSuccess(ShardInstance.java:980)
at build.buildfarm.instance.shard.ShardInstance$WorkersCallback.onSuccess(ShardInstance.java:970)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1080)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:321)
at com.google.common.util.concurrent.MoreExecutors$5.execute(MoreExecutors.java:1108)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:47)
at build.buildfarm.instance.shard.Util$1.complete(Util.java:129)
at build.buildfarm.instance.shard.Util$1.onSuccess(Util.java:142)
at build.buildfarm.instance.shard.Util$1.onSuccess(Util.java:126)
at build.buildfarm.instance.shard.Util$2.onSuccess(Util.java:164)
at build.buildfarm.instance.shard.Util$2.onSuccess(Util.java:161)
at build.buildfarm.instance.shard.Util$3.onSuccess(Util.java:197)
at build.buildfarm.instance.shard.Util$3.onSuccess(Util.java:188)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1080)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
at io.grpc.stub.ClientCalls$GrpcFuture.set(ClientCalls.java:558)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:531)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/9b305073-2647-443b-a482-3fbfd72bc8f2
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/927f28e7-c641-424d-9a90-462df86da47e
[INFO ] build.buildfarm.worker.Pipeline join - Interrupting unterminated closed thread in stage InputFetchStage at priority 3
There's a couple of problems here - one is that the updated match mechanisms are not accounting for null queueEntry as expected.
What revision was this test conducted at? Line numbers don't currently match up with what's being presented here.
This should be against v1.9.3. I may be wrong though as I may have moved forward a few revisions from there during this testing. If not, then it was most likely f21951b59f7af90143e38d0f10cbe761abe68de9.
Some of this noise should be reduced by changes in #897. These are all also secondary effects - nothing here is specifically broken, unless you can prove that some blobs are unconvergable with hex digest - we also intended to be backwards compatible with the hex buckets, where it should move them into the correct place. Are you finding that particular behavior broken? I thought I added a test for it.
Did ya'll end up running hex_bucket_levels > 0 in prod yet? I was looking at some of this code earlier today and was curious if it played out
I haven't tried it since George's changes in 897. It did not work in 1.9.3. But definitely update if you try it and it works as I wouldn't mind enabling it.
Cool, I'll circle back with how it plays out. I'm testing in the context of clean disks right now against some real traffic🤞 . In my local tests I did muck around with the backwards compat on today's master and couldn't get it to break so I've proceeded to canary this. It totally blows away the state / deletes the previous entries.. This seems aligned to what you saw here:
[INFO ] build.buildfarm.cas.cfc.CASFileCache logCacheScanResults - {"keys":94457,"dirs":1051,"delete":49686}
Curious myself if there's anything preventing this from being used: it's not a lot of code, but it was done to (maybe) improve performance beyond any linear cost performance implied in huge numbers of files in one directory, or make our CAS directories easier to handle for tools that might scan them. XFS, our recommended filesystem, does not appear to have any obvious linear scaling cost relative to the operations we perform (create+write new, delete, read, hardlink)
@werkt overall this has been running fine though prod traffic for the afternoon and no issues yet; caveat this as I haven’t even came close to peak eviction size yet, I won’t know for a while but will post back.
I re-ran the numbers on this and even using a value of one gets a theoretical 256x improvement in file layout and further capacity if you’re limited by number of directories. Then where folks are running BF outside of XFS this also improves quality of life. Based on an orthogonal experience employing a similar algorithm a few years ago in another setting and FS, I imagined this bucking might even be a sensible setting we can flip. I was prepared to add this feature but wasn’t surprised it was already there!
Do you have any or anecdotes about setting this to something greater than 1?