Trevor Hickey
Trevor Hickey
> @luxe nice! I was wondering if you able to continue a worker after redis transient failure after this PR end to end? Added a testing section to the PR....
> > @luxe nice! I was wondering if you able to continue a worker after redis transient failure after this PR end to end? > > Added a testing section...
> This definitely helps with some of the redis problems I was hitting, working good in general. The only possible thing I thought of was back off: perhaps could propose...
All of the `DEADLINE_EXCEEDED` failures are for the same `remote_addr=/10.35.222.84:8981`. I would think that a worker was to blame. If a single worker is timing out connections and it is...
I believe the requeue attempts are for the operation queue only, not the prequeue. The DispatchMonitor will scan for dispatchedOperations and put items back on the operation queue with a...
To clarify, its your prequeue that grows right? Your operation queue remains flat?
The other interesting thing is that `findMissingBlobs` is a very small and low traffic request. We don't even allow the request to be [over 4mb](https://github.com/bazelbuild/bazel-buildfarm/blob/4e52bc66e9fc1e456105929008c7f646a1b37f2c/src/main/java/build/buildfarm/instance/stub/StubInstance.java#L415). So it doesn't seem like...
I can confirm that in some cases we see the worker time dominated by `charge`. I suppose this happens when the CAS is over its max size, and many `puts`...
I'd say the priority queue's implementation is just not efficient. The interface behaves as advertised, but the lua code it calls needs removed or reworked. Outside the priority queue, the...
That PR improved it, We should improve the priority queue impl further.