bazel-buildfarm icon indicating copy to clipboard operation
bazel-buildfarm copied to clipboard

Hex bucketing does not work in the production level environment

Open 80degreeswest opened this issue 4 years ago • 8 comments

While we had success with enabling hex bucketing in dev and stage, the container crashes when I attempted to enable it in production under moderate loads.

hex_bucket_levels: 1

[INFO   ] build.buildfarm.worker.shard.Worker <init> - buildfarm-worker-10.35.222.15:8981-d655c277-29ae-4606-9621-806b8c5577d5 initialized
[INFO   ] build.buildfarm.cas.cfc.CASFileCache start - Initializing cache at: /var/buildfarm/worker/cache
[INFO   ] build.buildfarm.cas.cfc.CASFileCache joinThreads - Scanning Cache Root...
[INFO   ] build.buildfarm.cas.cfc.CASFileCache logCacheScanResults - {"keys":94457,"dirs":1051,"delete":49686}
[INFO   ] build.buildfarm.cas.cfc.CASFileCache joinThreads - Populating Directories...
[INFO   ] build.buildfarm.cas.cfc.CASFileCache logComputeDirectoriesResults - {"invalid dirs":0}
[INFO   ] build.buildfarm.cas.cfc.CASFileCache start - Creating Index
[INFO   ] build.buildfarm.cas.cfc.CASFileCache start - Index Created
[INFO   ] build.buildfarm.cas.cfc.CASFileCache start - Startup Time: 9s
[INFO   ] build.buildfarm.metrics.prometheus.PrometheusPublisher startHttpServer - Started Prometheus HTTP Server on port 9090
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/75f75c2c-9f84-42e4-84a5-504d3d62e136
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/5257bcc2-3a06-4cd3-9829-e4e008f03ee1
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/5e8fd818-8955-452a-9c8c-968a929bdf5a
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/ad71de7a-89c0-4cd2-96ca-a0f557fc75b9
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/15a29715-3f44-4fea-bc75-a71346363ea1
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/ddd2f02f-44c2-4798-9c46-f95711e35aba
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/1a4bf314-dfe4-4318-9edb-5c46aa01cf0d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/38517b9e-d03b-492f-ad8c-48bab12c424b
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/c837c7b7-5e39-4dc3-87ff-f9acb4fd0664
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/9802f9f7-d6f6-454a-9613-d96b1b168deb
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/fadb11dd-9d66-489b-ac02-17209c07e527
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/16caf333-3988-4537-9db5-d41f86d91b4d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/0eb949b4-6fab-48bd-9ebd-be0a15503027
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/bd278e33-fdf7-4163-8aa2-4dde25866c49
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/2933b14d-1222-4ac5-bd0f-6f655c6547c6
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/1262d023-a4fa-4d99-b5f5-7eb790418977
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/241e829a-f73f-40e8-8b9a-085627093f2f
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/e5ee6678-b561-4f1e-a575-a8518ab07bff
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/7d6bfbe9-5b33-4091-a882-6523dc0d8c89
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/ae5b9793-69e0-46c7-a9f5-e4127672f927
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/d5ccb622-bc32-4b70-94d7-c4a0740ada6b
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/46b70620-eb4b-48ce-a7d7-89fd54aeada0
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/740601c2-1cbb-4c22-a771-f036af984926
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/4c2dbd8e-a377-4de9-8382-a58b5ccf9320
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/bfba36ca-6840-4208-ac8b-c6d527bceb93
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/352b71cd-f99b-4661-b9ff-d71ecf7f4ffa
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/19be605c-3895-4d4f-8eef-d69b5ced16bc
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/0696315f-1713-4e09-8458-360e093c95d2
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/fa155540-a2c5-47e5-bc3d-411aef239958
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/d8c5b63a-7ddf-4730-aacc-2179fe964045
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/538709d8-c4ab-472d-8057-cd8c42137289
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/d8aa22b1-3eea-4475-9551-e551ef581319
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/3b84542a-e620-4a26-9fbf-87f294cf13da
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/72bb33f6-aada-4be4-91b1-831fffc82f14
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/7f127e0c-7f6e-47a9-9ad4-7d719093c56a
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/8ed1e62a-dfe0-4854-bc4d-1df00641f0e9
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/541bfc7b-3beb-4432-99db-883aac87be31
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/74e8770e-8236-47e1-9a3f-2e6730ae244a
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/95cd1be1-f3e4-4803-b205-dc634f4cefaa
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/c50c4168-b4b4-4dc9-ada6-469a1a965035
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/5881eb3d-1acd-416e-bbe3-0581014fe757
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/fde43662-f445-4d54-ac00-ba7b299c9862
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/bfb43ba0-ddf8-4ac4-b657-f66bccfeb968
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/d34ed40c-5d4d-4ea2-a5af-16e30138b05d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/4fe2f46e-1343-44a1-b4d7-57782c4ff962
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/3a5daaf5-b4eb-436b-a94d-5df1b871e4ca
[WARNING] build.buildfarm.cas.cfc.CASFileCache copyExternalInput - error downloading 8a5a830f61521bf482f3a286cb8531485f60895c/1692
java.nio.file.NoSuchFileException: 8a5a830f61521bf482f3a286cb8531485f60895c/1692
        at build.buildfarm.instance.shard.RemoteInputStreamFactory$1.onFailure(RemoteInputStreamFactory.java:256)
        at build.buildfarm.instance.shard.ShardInstance$WorkersCallback.onSuccess(ShardInstance.java:980)
        at build.buildfarm.instance.shard.ShardInstance$WorkersCallback.onSuccess(ShardInstance.java:970)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1080)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
        at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
        at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:321)
        at com.google.common.util.concurrent.MoreExecutors$5.execute(MoreExecutors.java:1108)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
        at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:47)
        at build.buildfarm.instance.shard.Util$1.complete(Util.java:129)
        at build.buildfarm.instance.shard.Util$1.onSuccess(Util.java:142)
        at build.buildfarm.instance.shard.Util$1.onSuccess(Util.java:126)
        at build.buildfarm.instance.shard.Util$2.onSuccess(Util.java:164)
        at build.buildfarm.instance.shard.Util$2.onSuccess(Util.java:161)
        at build.buildfarm.instance.shard.Util$3.onSuccess(Util.java:197)
        at build.buildfarm.instance.shard.Util$3.onSuccess(Util.java:188)
        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1080)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
        at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
        at io.grpc.stub.ClientCalls$GrpcFuture.set(ClientCalls.java:558)
        at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:531)
        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
        at java.base/java.lang.Thread.run(Thread.java:832)

[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/161372a8-46ed-4102-84ae-767bd66a6485
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/ad1d3b40-28f1-4e3c-9ea0-fc47ad238aa3
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/9f92ae70-9c9b-46ad-befa-821d32a9c9dd
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/6578de7d-8d14-4413-bb7b-04da58a1457d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/b4b1dd06-3ef9-4ae9-afc0-c4aaf048a66d
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/1c9a3034-86bb-4283-bb5f-4f81ad94486c
[SEVERE ] build.buildfarm.worker.PipelineStage run - MatchStage::run(): stage terminated due to exception
java.lang.NullPointerException: Cannot invoke "build.buildfarm.v1test.QueueEntry.getPlatform()" because "queueEntry" is null
        at build.buildfarm.worker.DequeueMatchEvaluator.shouldKeepOperation(DequeueMatchEvaluator.java:57)
        at build.buildfarm.worker.shard.ShardWorkerContext.matchInterruptible(ShardWorkerContext.java:315)
        at build.buildfarm.worker.shard.ShardWorkerContext.match(ShardWorkerContext.java:378)
        at build.buildfarm.worker.MatchStage.iterate(MatchStage.java:141)
        at build.buildfarm.worker.PipelineStage.runInterruptible(PipelineStage.java:44)
        at build.buildfarm.worker.PipelineStage.run(PipelineStage.java:51)
        at java.base/java.lang.Thread.run(Thread.java:832)

[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - error creating exec dir for shard/operations/9a6f04f2-7f80-41a9-bf3e-21772fa03a6f
build.buildfarm.worker.shard.CFCExecFileSystem$ExecDirException: /var/buildfarm/worker/shard/operations/9a6f04f2-7f80-41a9-bf3e-21772fa03a6f: 1 exceptions
        at build.buildfarm.worker.shard.CFCExecFileSystem.checkExecErrors(CFCExecFileSystem.java:307)
        at build.buildfarm.worker.shard.CFCExecFileSystem.createExecDir(CFCExecFileSystem.java:371)
        at build.buildfarm.worker.shard.ShardWorkerContext.createExecDir(ShardWorkerContext.java:721)
        at build.buildfarm.worker.InputFetcher.fetchPolled(InputFetcher.java:179)
        at build.buildfarm.worker.InputFetcher.runInterruptibly(InputFetcher.java:85)
        at build.buildfarm.worker.InputFetcher.run(InputFetcher.java:269)
        at java.base/java.lang.Thread.run(Thread.java:832)
        Suppressed: build.buildfarm.cas.cfc.CASFileCache$PutDirectoryException: /var/buildfarm/worker/cache/66/663cb99dd9cb52b9f91e1438223ed291e703ec09_dir: 1 exceptions
                at build.buildfarm.cas.cfc.CASFileCache.lambda$putDirectorySynchronized$19(CASFileCache.java:2177)
                at com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture.doTransform(AbstractTransformFuture.java:213)
                at com.google.common.util.concurrent.AbstractTransformFuture$AsyncTransformFuture.doTransform(AbstractTransformFuture.java:202)
                at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:118)
                at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1429)
                at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
                at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
                at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
                at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
                at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
                Suppressed: java.util.concurrent.ExecutionException: java.nio.file.NoSuchFileException: 8a5a830f61521bf482f3a286cb8531485f60895c/1692
                        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:566)
                        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:527)
                        at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:88)
                        at build.buildfarm.cas.cfc.CASFileCache.lambda$putDirectorySynchronized$19(CASFileCache.java:2168)
                        ... 9 more
                Caused by: java.nio.file.NoSuchFileException: 8a5a830f61521bf482f3a286cb8531485f60895c/1692
                        at build.buildfarm.instance.shard.RemoteInputStreamFactory$1.onFailure(RemoteInputStreamFactory.java:256)
                        at build.buildfarm.instance.shard.ShardInstance$WorkersCallback.onSuccess(ShardInstance.java:980)
                        at build.buildfarm.instance.shard.ShardInstance$WorkersCallback.onSuccess(ShardInstance.java:970)
                        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1080)
                        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
                        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
                        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
                        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
                        at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
                        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
                        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
                        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
                        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
                        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
                        at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
                        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
                        at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:321)
                        at com.google.common.util.concurrent.MoreExecutors$5.execute(MoreExecutors.java:1108)
                        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
                        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
                        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
                        at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:47)
                        at build.buildfarm.instance.shard.Util$1.complete(Util.java:129)
                        at build.buildfarm.instance.shard.Util$1.onSuccess(Util.java:142)
                        at build.buildfarm.instance.shard.Util$1.onSuccess(Util.java:126)
                        at build.buildfarm.instance.shard.Util$2.onSuccess(Util.java:164)
                        at build.buildfarm.instance.shard.Util$2.onSuccess(Util.java:161)
                        at build.buildfarm.instance.shard.Util$3.onSuccess(Util.java:197)
                        at build.buildfarm.instance.shard.Util$3.onSuccess(Util.java:188)
                        at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1080)
                        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
                        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
                        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
                        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
                        at com.google.common.util.concurrent.AbstractTransformFuture$TransformFuture.setResult(AbstractTransformFuture.java:247)
                        at com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:163)
                        at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
                        at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
                        at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
                        at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:746)
                        at io.grpc.stub.ClientCalls$GrpcFuture.set(ClientCalls.java:558)
                        at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:531)
                        at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
                        at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
                        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
                        at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
                        at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
                        at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
                        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
                        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
                        at java.base/java.lang.Thread.run(Thread.java:832)

[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/9b305073-2647-443b-a482-3fbfd72bc8f2
[SEVERE ] build.buildfarm.worker.InputFetcher fetchPolled - invalid queued operation: shard/operations/927f28e7-c641-424d-9a90-462df86da47e
[INFO   ] build.buildfarm.worker.Pipeline join - Interrupting unterminated closed thread in stage InputFetchStage at priority 3

80degreeswest avatar Aug 11 '21 13:08 80degreeswest

There's a couple of problems here - one is that the updated match mechanisms are not accounting for null queueEntry as expected.

What revision was this test conducted at? Line numbers don't currently match up with what's being presented here.

werkt avatar Aug 23 '21 13:08 werkt

This should be against v1.9.3. I may be wrong though as I may have moved forward a few revisions from there during this testing. If not, then it was most likely f21951b59f7af90143e38d0f10cbe761abe68de9.

80degreeswest avatar Aug 23 '21 13:08 80degreeswest

Some of this noise should be reduced by changes in #897. These are all also secondary effects - nothing here is specifically broken, unless you can prove that some blobs are unconvergable with hex digest - we also intended to be backwards compatible with the hex buckets, where it should move them into the correct place. Are you finding that particular behavior broken? I thought I added a test for it.

werkt avatar Aug 25 '21 20:08 werkt

Did ya'll end up running hex_bucket_levels > 0 in prod yet? I was looking at some of this code earlier today and was curious if it played out

jerrymarino avatar Apr 20 '23 17:04 jerrymarino

I haven't tried it since George's changes in 897. It did not work in 1.9.3. But definitely update if you try it and it works as I wouldn't mind enabling it.

80degreeswest avatar Apr 20 '23 17:04 80degreeswest

Cool, I'll circle back with how it plays out. I'm testing in the context of clean disks right now against some real traffic🤞 . In my local tests I did muck around with the backwards compat on today's master and couldn't get it to break so I've proceeded to canary this. It totally blows away the state / deletes the previous entries.. This seems aligned to what you saw here:

[INFO ] build.buildfarm.cas.cfc.CASFileCache logCacheScanResults - {"keys":94457,"dirs":1051,"delete":49686}

jerrymarino avatar Apr 20 '23 19:04 jerrymarino

Curious myself if there's anything preventing this from being used: it's not a lot of code, but it was done to (maybe) improve performance beyond any linear cost performance implied in huge numbers of files in one directory, or make our CAS directories easier to handle for tools that might scan them. XFS, our recommended filesystem, does not appear to have any obvious linear scaling cost relative to the operations we perform (create+write new, delete, read, hardlink)

werkt avatar Apr 20 '23 23:04 werkt

@werkt overall this has been running fine though prod traffic for the afternoon and no issues yet; caveat this as I haven’t even came close to peak eviction size yet, I won’t know for a while but will post back.

I re-ran the numbers on this and even using a value of one gets a theoretical 256x improvement in file layout and further capacity if you’re limited by number of directories. Then where folks are running BF outside of XFS this also improves quality of life. Based on an orthogonal experience employing a similar algorithm a few years ago in another setting and FS, I imagined this bucking might even be a sensible setting we can flip. I was prepared to add this feature but wasn’t surprised it was already there!

Do you have any or anecdotes about setting this to something greater than 1?

jerrymarino avatar Apr 21 '23 01:04 jerrymarino