nimbus-eth2
nimbus-eth2 copied to clipboard
Avoid double decompression of gossip messages
Gossip messages are decompressed twice in the current flow:
During message id generation: https://github.com/status-im/nimbus-eth2/blob/74f2350a2c85b60230e2d181ef34fe1831a5ba1a/beacon_chain/networking/eth2_network.nim#L1846
Once during validation: https://github.com/status-im/nimbus-eth2/blob/74f2350a2c85b60230e2d181ef34fe1831a5ba1a/beacon_chain/networking/eth2_network.nim#L1952
Once would be enough, really - messages that fail to decompress are rejected - they really don't need a message ID because they end up rejected anyway. Rejecting and not storing their id in the seen cache also avoids wasting memory on junk.
cc @Menduist @dryajov
Ok, we can avoid storing rejected messages, but not sure how we can prevent double compression in libp2p?
We need to compute the message id to dedup before validation, unless you want to validate 8* more messages. Or I missed something?
avoid storing rejected
yeah, the validation would have to return a ValidationResult that would affect scoring as well
Or I missed something?
well, it would need some new API to do this efficiently I guess - I believe rust-libp2p allows this optimization somehow, for example
Ok, so msgIdProvider could fail (we can use an exception, or a ValidationResult), and in that case we skip immediately And if the regular validation failed, we also don't store
Lgty?
And if the regular validation failed, we also don't store
we should still store if regular validation fails: we want to avoid validating failed messages repeatedly - the message id however, we can't avoid doing repeatedly regardless of failure or not
This is possible to do with observers which allow modifying the received or sent message. Observers are chained, so it's possible to modify the message in different ways.
Observers for incoming messages are called before validation happens, so it's possible to decompress the message there, observers for outgoing (sent or forwarded) messages are called on the raw message, before encoded to protobuf, so it's also possible to compress the message there if desired.
Observers are defined as:
PubSubObserver* = ref object
onRecv*: proc(peer: PubSubPeer; msgs: var RPCMsg) {.gcsafe, raises: [Defect].}
onSend*: proc(peer: PubSubPeer; msgs: var RPCMsg) {.gcsafe, raises: [Defect].}
and can be added/removed with:
proc addObserver*(p: PubSub; observer: PubSubObserver)
proc removeObserver*(p: PubSub; observer: PubSubObserver)
This is possible
this doesn't seem to cover scoring though - data that can't be decompressed means a reject -> score loss. Also, we don't want to compress the data again for rebroadcast - then it's better to decompress twice.
Invalid message are now rejected early, https://github.com/status-im/nimbus-eth2/pull/3438
Invalid message are now rejected early
Valid messages still go through two snappy decompressions
Oh ok, didn't understand that you also wanted to avoid that. So libp2p should allow you to return custom data from the messageId provider, and forward it back to the validator?
forward it back to the validator?
something like that, yeah - it's important that we don't do "double" processing - at the same time, we have to be mindful that copying the data back to libp2p and then to the handler might be worse than decompressing twice, if done inefficiently, so doing this properly would require some benchmarks that we don't presently have.