Allow setting maximum number of os threads
Golang has a hard limit of 10000 threads and will panic if exceeded. However, when serving many requests, this limit is easily exceeded by bazel remote. Adding a flag to allow configuring it works around the problem.
@ulrfa encountered this limit a while back, and IIRC in that case increasing this limit only helped briefly because of the amount of disk IO backlog and the size of the incoming IO spike. See #638 and #696. I wonder if it's worth landing some version of that change instead?
I agree that increasing the number of threads is not suitable for addressing this issue.
We've been running #696 in production since 2023, and it has proven efficient and stable for both gRPC and HTTP with the disk backend. However, we have not tested it with proxy backends.
Do you want me to try rebase #696 on top of the latest master?
I agree that increasing the number of threads is not suitable for addressing this issue.
We've been running #696 in production since 2023, and it has proven efficient and stable for both gRPC and HTTP with the disk backend. However, we have not tested it with proxy backends.
Do you want me to try rebase #696 on top of the latest master?
Yes please. Perhaps @AlessandroPatti can then try it out also.
For us, increasing the OS limit did help. My understanding is that there are two main sources of go routines: evictions and requests. Improving the evictions might help towards the OS threads issue, but you could still have it if the cache is receiving a lot of requests?
For us, increasing the OS limit did help.
For us it did not. It still crashed, just a bit later. The evictions can be unpredictable, e.g. if a new huge file suddenly evicts thousands of old small files.
My understanding is that there are two main sources of go routines: evictions and requests.
Yes, evictions and Put requests writing to disk, because those involves blocking system calls. However, I have never observed (non-proxied) Get requests consuming any significant amount of OS threads.
Improving the evictions might help towards the OS threads issue, but you could still have it if the cache is receiving a lot of requests?
#696 handles requests by a semaphore and evictions by a single background goroutine. This approach not only avoids running out of OS threads but also improves performance. See commit messages for details.
I plan to push a rebase of #696 tomorrow.
#696 is now rebased.