faabric
faabric copied to clipboard
Scaling issues
This is a grab-bag of issues encountered when scaling Faasm and Faabric to many requests per second.
Solutions to most of these problems are implemented in this fork: https://github.com/auto-ndp/faabric
There is a corresponding issue in Faasm: https://github.com/faasm/faasm/issues/504
- [x] Executor initialization is synchronous within the scheduler thread before the child executor thread is spawned (thus blocking the scheduler). Instead it can be moved to just before the thread loop in the Executor.
- [x] ZeroMQ socket explosion - for each Executor we open several sockets to handle the different client/ server pairs. This means we open
n_executors * n_ports_per_executor, which hits system-wide limits with a lot of Executors. Instead we could multiplex more/ all calls through a single socket. Because Executors are handled with separate threads, it would not be possible to cache the clients (as 0MQ sockets cannot be shared between threads). AFAICT #260 should address this for MPI which was the main culprit. - [x] ZeroMQ lacking thread-safety - There are a few places where the transport code is (potentially unnecessarily) complex, due to the lack of ZeroMQ thread safety. ZeroMQ's successor (x2) nng could be used as a thread-safe replacement. (Done in #286)
- [x] The current HTTP Endpoint implementation can quickly become a bottleneck when serving lots of external requests. An alternative, async HTTP implementation based on Boost beast has worked well and can handle thousands of concurrent connections per worker thread. (Done in #274)
- [ ] The use of Redis to return function results is unnecessary when the call is synchronous, as it just needs to be returned to the calling host. This can be done with a direct ZeroMQ message (similar to how we handle thread results).
- [x] Use of Redis for discovery may also be unnecessary if we can have a proxy running somewhere in the cluster, or use some form of broadcast (#300)
- [x] Executor shutdown doesn't clean up resources, it just moves the Executor to a vector of dead Executors. This is done to avoid some kind of deadlock (according to the comments), but it causes a serious memory leak if the Executors load a lot of data. There should be a way to prune these dead executors, either based on time or the overall function lifecycle. #252
- [ ] Execution does not time out, even if the calling client times out. This is difficult as we would need some sort of monitoring thread to kill long-running tasks, and it would be impossible to tell whether they had hung or were just really long-running. Instead, we could add a check to the scheduler, to avoid executing any tasks that have already passed their timeout before execution. This would avoid a traffic jam problem, but not solve the original lack of timeout.
Awesome, thanks @eigenraven. I'll rearrange the raw text above and split between this issue and https://github.com/faasm/faasm/issues/504 for the Faasm-specific stuff.
Thinking about the protobuf inefficiencies, because gRPC is gone, I don't think there's any reason not to convert everything to flatbuffers at this point
Yes this was the ultimate aim when we started using FB, as it would remove many of the serialisation issues (all the TODOs sprinkled about the place regarding copies). My gut feel is that this would be non-trivial, but it could be done object-by-object (leaving the big Message class till last).