ic icon indicating copy to clipboard operation
ic copied to clipboard

feat: Introduce msg_limit for application subnets in payload builder

Open stiegerc opened this issue 1 year ago • 2 comments

The begin of a stream slice we induct is used for garbage collecting signals. Therefore if we want at most 50k signals in the reverse stream, we have to make sure the stream index of the last message in the slice is at most slice_begin + 50_000.

Since reverse_stream.signals_end >= slice_begin we may have signals remaining after gc'ing, so the number of messages we can include in a slice is given by slice_end - reverse_stream.signals_end or slice_begin + 50_000 - reverse_stream.signals_end.

This PR adds a new function get_signal_end() that computes an upper limit for how many messages can be included into a stream slice using the logic explained above. Note that this can not lead to a lock scenario, because a case where we can't include anymore messages can always be resolved by moving the begin in the incoming slice, i.e. by garbage collecting on the sending subnet.

stiegerc avatar Oct 02 '24 12:10 stiegerc

After inducting a stream slice, the number of signals in the outgoing stream is equal to number_of_signals_after_gcing + number_of_messages_inducted. So if we want to cap the number of signals in a stream at 50k, we can induct no more than 50k - number_of_signals_after_gcing number of messages.

Gc'ing signals is done using begin in the stream slice, so number_of_signals_after_gcing = outgoing_stream_signals_end - incoming_slice_begin; the signals_end of the outgoing stream is given by message_index of ExpectedIndices.

I find this very hard to parse. A simpler way of putting it is that if we want to limit the outgoing signals to 50k, then we should not induct messages beyond index incoming_stream_begin + 50k (do note that that's "incoming stream begin", which is different from "incoming slice begin"). How many (or rather which) messages we can induct, directly follows from the above (from wherever we are now to incoming_stream_begin + 50k).

alin-at-dfinity avatar Oct 05 '24 06:10 alin-at-dfinity

AFAICT msg_limit in take_prefix() is now always Some(_) so the Option<_> can be removed, but I left it in since I am not sure whether we may want to keep it as it is and also to keep the diff at a minimum.

stiegerc avatar Oct 15 '24 15:10 stiegerc

LGTM overall, I left a comment about the index juggling that I think can be simplified. What's unclear to me is whether we issue a warning if the stream itself is above MAX_STREAM_MESSAGES, IIUC this should not happen with honest subnets?

Oversized streams are a normal part of operation, but they can only do that with reject responses. So before reject signals for requests were introduced: If you can't deliver a request for whatever reason, the subnet generated a reject response and pushed it into the stream back to the subnet where it came from, this had to be done even if the stream was at capacity, hence the stream got oversized.

Now with reject signals for requests we have a similar situation when we get a reject signal for which we generate a reject response locally. If the canister has migrated in the mean time, we have to push it into the stream to its new subnet instead, even if the stream is full.

edit: The new messaging model may bring some of this back. In any case, it is supported if the need arises. This PR just makes the number of signals capped at a reasonable level, i.e. 10k or what it would be if not for oversized streams.

edit: And just to reiterate: This doesn't introduce any lock scenarios because we can hit this limit only if a subnet sends a lot of messages without garbage collecting them on it's side for which it would need our signals and after which we can garbage collect the signals on our side, which in turn pushes us below the limit again. So a change in behavior from just one side without having to await anything can resolve this at all times. All it means is that we insist the other subnet garbage collect its messages within a buffer of 10k messages. This is quite generous and we should basically never come into a situation where this would hamper XNet traffic in any way.

stiegerc avatar Oct 29 '24 13:10 stiegerc

Thanks! I missed the case of a migrating canister, a warning doesn't make sense if this can happen as part of normal operation (even if subnet splitting is a bit of special case)

oggy-dfin avatar Oct 30 '24 10:10 oggy-dfin