Interop: Block Builder Denial of Service Protection
Since the validity of an interop block depends on the invariant that all executing messages must have valid initiating messages and it requires execution to find the executing messages, this creates a denial of service risk for the sequencer/block builder. The best way to fight denial of service attacks is with horizontal scalability. An approach was proposed in https://github.com/ethereum-optimism/design-docs/pull/7 but it was ultimately decided that we do not want to introduce more microservices into the architecture.
Instead we can take the approach of adding some logic directly into the op-geth mempool. If op-geth is configured with the op-supervisor RPC URL, then the mempool will execute the inbound transaction on the latest state and then check against the supervisor that each executing message has a valid initiating message before adding the transaction to the mempool. This allows a simple cloud architecture where sentry nodes can be deployed in front of the sequencer and provide the denial of service protection.
May I work on this issue?
May I work on this issue?
Sure - can we work on a small design doc here before starting? It should include specifics around the implementation, like an overview of what the diff would look like. Just to make sure we know the full scope of the problem.
Cool, will work on it. Can you please share your Telegram ID for smooth discussion?
Please follow up with a design doc directly in this thread, we are not at a stage where we can provide individual support
This is a duplicate of https://github.com/ethereum-optimism/optimism/issues/10890
https://github.com/ethereum-optimism/op-geth/pull/422
Here's an implementation which should reject transactions from the mempool as they enter if any "Ingres Filters" should deny the Tx. The one Igress Filter I wrote gets the logs from execution and passes them through the Supervisor.
Besides checking ingress, we should re-evaluate a tx that is in the mempool after a while. The block-building code-path will invalidate a tx also, but as verifier node we need to drop these from the mempool more proactively, so the final block-building doesn't get hit with this work.
While I agree that it would be ideal for the filtering to check frequently, I don't know that prioritizing that additional feature is the right call until we know it's needed. I'd like to come up with a way to determine if this is needed.
There are two ways I can think of that a transaction which previously passed the filter check would change to failing:
- The chain which holds the initiating event experienced a reorg, so the referenced data is no longer in the expected location.
- The transaction in question changes which EM it creates based on differences in the EVM state since the last time it was checked.
The reorg case seems very unlikely to be an effective reason to check the filter again, as we would expect these to be very rare, and in that case the block builder can handle those transactions.
The changing-tx case is more valuable, but I still think we shouldn't over-optimize the filter until we see that this sort of thing actually happens. The filter works by simulating each transaction on a throw-away state, which is not trivial (as a block of transactions enter the mempool, the filter effectively does the work of evaluating a whole block, resetting the state between every Tx). To re-evaluate the transactions in the mempool would create an even higher executing pressure.
Summary, I'm not opposed to expanding the filter behavior to include re-evaluating transactions over time, but I want to see that we have a need for it before we add the complexity, considering we have no production data and it would directly multiply the amount of work our Executing Engines are doing.
Closing due to the access list design