pobox Future Enhancements

We've been using pobox for a while now and we have identified a set of issues some of which might need a new major version to resolve. My goal here is to get some feedback.

The first issue: messages have equal weight. That is not suitable for situations where messages have significantly different sizes, affecting the memory footprint of the buffer. This would mean that the PO Box would effectively have two limits a message count limit (total number of messages) and a total buffer weight limit (sum of all message weights). The buffer would store a structure like this {Weight, message}. This can be used also for the filtering function in active state (desired weight vs. desired message count). A friend suggested that a configurable function to calculate weights would be more flexible. The downside is that calculations done on the pobox side may affect performance. Hence we propose an explicitly stored, pre-calculated weight.

The second issue is we want to fail fast when the PO Box is full (suitable for “keep-old” buffers): the client can receive early feedback from the post_sync call. This functionality already exists. But there is an issue with latency when sending many consecutive messages from the same client. We’d like to send multiple messages without waiting for a reply each time (avoiding a stop and go type communication). Support for promises similar to rpc:async_call/4 and rpc:yield/1 could be used when multiple messages need to be sent concurrently from the same process. Note that a message can still be discarded at a later stage, even if it didn’t “fail fast”.

The third issue I see is that there isn't a built-in way to use a PO Box with calls: the PO Box owner responds directly to the client when processing the message received from the PO Box. I have emulated this by rolling my own fork of gen:call with support for early feedback when the PO Box is full. But I feel this belongs inside the pobox library. Here is an illustration of the happy path in this crappy ASCII diagram:

         +------+          +-----+           +-----+
         |client|          |pobox|           |owner|
         +--+---+          +--+--+           +--+--+
            |    post_call    |                 |
        +-> |---------------->|     got mail    |
        |   |                 |---------------->|
        |   |                 |     send it     |
        |   |                 |<----------------|
client  |   |                 |     messages    |
blocked |   |                 |---------------->|
        |   |                 |       ...       |
        |   |                 |                 |
        |   |         reply to post_call        |
        +-> |<----------------------------------|
            |                 |                 |

Oct 09 '19 11:10 edescourtis

The message weight does sound like something that could be simple enough to add. As you said, it should probably be calculated client-side, but that could be hidden in the call/cast public interface functions, which would allow to scope the weight calculation to the callsite while hiding the implementation locally; it would probably break the handle_info functionality for bare messages though, or require some default to be configured.

It's not necessarily simple to figure out how to do it. Another thing that is harder to deal with is the handling of edge cases like "calculated weight is bigger than total configured capacity" or "invalid ranges such as negative or null weights" could be having on stability; those can possibly be blocked and ignored. The weight would probably need to be an integer [1..MAX[ with a default weight of 1 in order to keep the existing logic intact as a default case.

I think support here could be added without breaking backwards compatibility, but would require passing weight-calculation operations into every call and message, even if they get validated on the processing side.

Regarding the second point and third points (I think they're roughly the same thing), this makes sense. However, if we start handling bare messages and not just replies, we need to change the kind of API. It can't go through the standard posting. The easy ways to do it is to spawn a process that sends the message. Another one is to establish a protocol on handle_info calls that sends a reply to a given target destination. Another one is to make a new kind of asynchronous post message.

I'm also curious about alternatives, such as:

Allowing a status monitor to send messages when full; this doesn't tell you whether a message is accepted or dropped, but lets you know when the queue gets full
Allowing a batch insertion mechanism, so that instead of adding one term, a list is added; the return value of each term lets you know which was handled and which wasn't; this has the advantage of reducing the amount of context-switching needed and copying operations done by a process
publishing "fullness" status to an ETS table or a pair of counters (in newer Erlang versions; a pair is chosen for variable weights so that the client can calculate whether its message would fit or not) and making it available to the client-side for reads; this could allow an early check-and-bail client-side without a major change in semantics. This early drop would likely increase the ability of the pobox process to catch up by reducing the amount of data sent to it during periods of heavy overload.

Any of these can likely be implemented without breaking backwards compatibility either

IMO the trickiest one to do is the weighing one, mostly because it has far more potential edge cases, and that user customization can lead to vastly less stable usage. There's also an overhead in handling the weight in/out and pattern matching it on every message, whether they use custom weighing or not, but that could partially be offset by client-side drops that take place earlier in synchronous cases (but won't do any good for the rest).

Oct 09 '19 14:10 ferd

The message weight does sound like something that could be simple enough to add. As you said, it should probably be calculated client-side, but that could be hidden in the call/cast public interface functions, which would allow to scope the weight calculation to the callsite while hiding the implementation locally; it would probably break the handle_info functionality for bare messages though, or require some default to be configured.

We can simply assume that bare messages have a default weight of 1 when in raw mode (the default) and weighted mode for bare messages if configured.

It's not necessarily simple to figure out how to do it. Another thing that is harder to deal with is the handling of edge cases like "calculated weight is bigger than total configured capacity" or "invalid ranges such as negative or null weights" could be having on stability; those can possibly be blocked and ignored. The weight would probably need to be an integer [1..MAX] with a default weight of 1 in order to keep the existing logic intact as a default case.

I think support here could be added without breaking backwards compatibility, but would require passing weight-calculation operations into every call and message, even if they get validated on the processing side.

Agreed.

Regarding the second point and third points (I think they're roughly the same thing), this makes sense. However, if we start handling bare messages and not just replies, we need to change the kind of API. It can't go through the standard posting. The easy ways to do it is to spawn a process that sends the message. Another one is to establish a protocol on handle_info calls that sends a reply to a given target destination. Another one is to make a new kind of asynchronous post message.

I am confused about what you mean by "they're roughly the same thing". I would like to do issue 2 without adding extra processes since this is intended for overload scenarios (think of it as gen:call split into separate send and receive functions). Establishing a protocol with a from on the handle_info of the owner is definitely what I had in mind for 3 (I was thinking of using a message format compatible with the exiting OTP gen:call).

I'm also curious about alternatives, such as:

Allowing a status monitor to send messages when full; this doesn't tell you whether a message is accepted or dropped, but lets you know when the queue gets full

This is a cool idea.

Allowing a batch insertion mechanism, so that instead of adding one term, a list is added; the return value of each term lets you know which was handled and which wasn't; this has the advantage of reducing the amount of context-switching needed and copying operations done by a process

I am a little concerned that it could cause head of the line blocking in multi-node scenarios. Is this still a problem in BEAM or has it been fixed? But we could allow the caller to specify a batch size "send these 200 messages 10 at a time".

publishing "fullness" status to an ETS table or a pair of counters (in newer Erlang versions; a pair is chosen for variable weights so that the client can calculate whether its message would fit or not) and making it available to the client-side for reads; this could allow an early check-and-bail client-side without a major change in semantics. This early drop would likely increase the ability of the pobox process to catch up by reducing the amount of data sent to it during periods of heavy overload.

The issue here is that this can only be done for processes alive on the same node which is okay if we keep in mind that the PO Box must still work correctly when the client is on a remote node.

Any of these can likely be implemented without breaking backwards compatibility either

We should probably track the performance characteristic between the old and new version of pobox to make sure we understand the performance impact.

IMO the trickiest one to do is the weighing one, mostly because it has far more potential edge cases, and that user customization can lead to vastly less stable usage. There's also an overhead in handling the weight in/out and pattern matching it on every message, whether they use custom weighing or not, but that could partially be offset by client-side drops that take place earlier in synchronous cases (but won't do any good for the rest).

Agreed.

Oct 10 '19 17:10 edescourtis

I haven't had the time to read the whole thing, but I wouldn't use POBox for a process on a remote node because the distribution port is likely to be slower than the message passing you would otherwise get. Rather, I'd expect to put POBox on the local box to throttle the data going over the dist port. I'd have to readjust expectations if that's a use case you really have.

Oct 10 '19 19:10 ferd

That is something I am doing currently. It's not really an issue that the dist port is too slow in my case as far as I know.

I use PO Box also for some node local communication.

Oct 10 '19 20:10 edescourtis

I see a possibility of misunderstanding. What is remote?

Client remote, PO Box + Owner on the same node vs.
Client + PO Box on the same node, Owner is remote

I am thinking 1.

Oct 11 '19 02:10 edescourtis

I would potentially use the whole client + PO Box + Owner on the same node; the owner is in charge of sending the remote data (and handling retries or not). I'm assuming that whatever message-passing bandwidth is narrower over the network than in memory, so I'd put the whole overload-handling before the socket buffer ever has a chance to fill!

Oct 11 '19 02:10 ferd

Interesting observation but then wouldn't you need to run one PO Box per node? How well does that scale? Not to mention I use PO Box in an autoscaling group on AWS (dynamically increasing/decreasing set of nodes). Also in this case managing the size of the buffer would be more complicated since you would need to resize buffers when adding and removing nodes.

Oct 11 '19 03:10 edescourtis

Sounds like a very interesting problem to solve!

Instead of managing all that I can imagine you would have an application lets call it "The Post Office". It would have one process per scheduler and would collectively hold all of the buffers for the node (local and distributed ones). It would also manage buffer sizes across the cluster to avoid a NumOwners * BufferSize * NumNodes situation. We could also use TCP sockets instead of the distribution ports to avoid head of the line blocking combined with all of the other ideas we already mentioned.

Then again that might be overkill.

Oct 11 '19 03:10 edescourtis

yeah it's different if it's all nodes writing to a single process. In the worst case then the socket buffer becomes your backpressure mechanism but you might still need to protect the central process. It's a bit tricky.

Oct 11 '19 19:10 ferd

The protocol between pobox and it's owner is not ideal for cross node communication or high latency links so that would need to be addressed.

Oct 12 '19 00:10 edescourtis

Yeah, I'd have to say that has never really been the intent. The way the original pattern was used in logplex and the way it's been used (for me) since has always been intra-node buffering, usually in front of a slower network socket.

Oct 12 '19 02:10 ferd

Yes and I use it that way also. But I've had success using pobox remotely as well since input traffic isn't sufficient to saturate the communication between nodes. So pobox is suitable between nodes under some circumstances. I could explain my system in more detail but that would be best done on a call. Writing it down would be too time consuming and maybe not appropriate over a public medium. By the way I am in Ottawa so if you ever stop by feel free to reach out. I figure I owe you a 🍺 or two for all the contributions you've made over the years.

In any case if you have no objections I would like to add the weighting as a starting point. And perhaps we can consider adding early detection of a full buffer when on the same node.

Oct 13 '19 03:10 edescourtis

Yeah I think the weighing makes sense to add in the first place; regardless of location, it can be useful and act as a good buffer. As in my post, I think that having it calculated client-side (with a default of 1 if unspecified and range validation server-side) is probably best.

Oct 13 '19 19:10 ferd

pobox pobox copied to clipboard

Future Enhancements

pobox
pobox copied to clipboard