rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

Error when sending large amounts of messages: `libp2p_gossipsub::handler] Message exceeded the maximum transmission size and was not sent.`

Open Frederik-Baetens opened this issue 3 years ago • 4 comments

I'm publishing a struct serialized with bincode of the following form:

#[derive(Serialize, Deserialize, Debug)]
pub struct CreatedPrefixNode {
    pub prefix: String,
    pub address: SocketAddr,
    pub version: u32,
}

The strings are never over 20 characters long. (I generate them myself)

When publishing a low amount of these messages (up to 10K), I get no issues, but when publishing about 100K messages quickly in a row, I get the error libp2p_gossipsub::handler] Message exceeded the maximum transmission size and was not sent.

Does gossipsub do any kind of message grouping that may lead to issues like this when sending lots of messages?

Frederik-Baetens avatar Apr 02 '22 18:04 Frederik-Baetens

@divagant-martian or @AgeManning might be able to help here.

mxinden avatar Apr 05 '22 19:04 mxinden

Hey, this is the error we get from the codec when the message is too big and we don't group them to be sent. Can you provide some code we could check?

divagant-martian avatar Apr 05 '22 20:04 divagant-martian

@divagant-martian Yeah, the publishing is done here, from a struct sent through a channel: https://gitlab.com/freddyb/scuddb/-/blob/master/server/src/metacom/swarmhandle.rs#L75

And you can see the struct creation here: https://gitlab.com/freddyb/scuddb/-/blob/master/server/src/metacom/swarmhandle.rs#L75

I'm just running a benchmark with 18 character long strings. The strings are created here, and eventually end up being published through gossipsub as part of that struct in the channel.

I'll try to do some debugging to see if I can display some more information about the message being sent when I get that error. If that doesn't lead anywhere, I'll try to produce a smaller reproduction.

Frederik-Baetens avatar Apr 05 '22 20:04 Frederik-Baetens

We publish each message as you submit it and we ensure each message is below the max_transmit_size before publishing, so we would know if the messages you are sending are too large.

There is one area where we group messages. These are control messages, more specifically IHAVE/IWANT messages. We send these each heartbeat and try and group them into one RPC. I saw you have set your heartbeat to 10 seconds, so with a large number of messages running around the network, I imagine you have a very large pool of control messages.

When gossipsub tries to send a large number of control messages in one hit, it tries to fragment them into individual smaller ones so that you stay under the max transmit size. It does this here: https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/src/behaviour.rs#L2916

It sounds like potentially that function is not working as expected in some cases. Some extra logs might be handy, but if this is the cause, its probably unlikely you'll see anything, but I suspect if you lower your heartbeat you'll reduce the likelihood of this buildup and having to fragment RPCs.

Another temporary workaround would be to increase the max_transmit_size in the config, but the fragmentation should be working, so i'll look into it.

AgeManning avatar Apr 06 '22 04:04 AgeManning