at_server
at_server copied to clipboard
Server can lock up during a large initial monitor response
Describe the bug
- Code in
MonitorVerbHandler.processVerbwill hold onto the event loop until it completes. In turn this means that, if themonitor:request results in a large number of notifications being sent to the client in the initial monitor response, the server becomes completely unresponsive to other requests. - Code snippet from
MonitorVerbHandler.processVerb:var receivedNotifications = await _getReceivedNotificationsAfterEpoch(fromEpochMillis); for (var notification in receivedNotifications) { processReceivedNotification(notification); } - Primary problem is that processReceivedNotification is not async, it calls no awaits, so if the
receivedNotificationslist is large, no other event will be handled until theforloop completes.
There are additional issues that should be cleaned up while fixing this bug
- If the
receivedNotificationslist is very large, then it could cause the server process to use a lot of memory - This code currently first of all gets all matching notifications, then sends them. For a large number of notifications, this will introduces noticeable latency before notifications start to be sent to the client
- Lastly - currently we first of all register the notification callback, and then start sending the matched notifications.
- Let's consider the following scenario:
- Client sends
monitorrequest - Server registers notification callback
- Server sends matched notifications to the initial response
- Client sends
- Now imagine that step 3 takes a long time, and while it is being executed, a notification arrives that should be sent to the client
- But if we allow other events to get a look-in, then we could have a notification arrive and get sent to the client, and it will be out-of-order because the server is still sending older matching notifications
- With the current code, this is not actually possible, since the event loop is being hogged by the initial monitor response
- Let's consider the following scenario:
To Reproduce Steps to reproduce the behavior:
- Have
@bobsend 10,000 notifications to@alice - Have
aliceclient sendmonitor:0to@alicesecondary - While the 10,000 notifications are being returned, the server will not respond to any other commands.
Expected behavior
-
MonitorVerbHandler.processVerbneeds to allow another event to get a look-in. (Easy) - We implement the fix in such a way as to ensure that out-of-order notifications can not happen (Quite difficult)
Adding @purnimavenkatasubbu to help develop a test suite for this scenario detailed in the ticket
No impl progress during PR47 due to lots of other unplanned work. Moving to PR48. Reducing estimate to 5SP as can reuse several aspects of the fsync implementation.