bug: batcher DoS if batch with max_batch_size is sent twice

Open Oppen opened this issue 11 months ago • 1 comments

Reported by cinderblock in cantina issue #59. Transcript of its description follows:

Batcher submissions will fail if the exact same batch merkle root has already been submitted on-chain because of the revert in AlignedLayerServiceManager.createNewTask(). When the call reverts, the batcher will keep retrying to resubmit the same batch with the same merkle root as long as there are no other proofs that fit into the batcher queue. an attacker can exploit this to block the batcher's queue for all users, taking into consideration that:

batches submitted on-chain and on S3 are limited to 256mb and a single proof is limited to 16mb (see note 1.)
batch queue is ordered from lowest to highest max_fee (as specified by users when submitting a proof to the batcher)
Also the attacker would also need to deposit funds in BatcherPaymentService contract to inflate max_fee, on a successfull attack he can block all batch processing and unlock his funds later on.

steps for the attack:

submit a 16mb proof 16 times, this should cost (0.0013 * 16) ether or ~76$ as the fee amount is dependent on the number of proofs in the queue and size is not taken into consideration.
once the batch has been processed and submitted on-chain, the attacker locks 8 ether in the BatcherPaymentService contract and resubmit the same exact 16mb proof 16 times, this time with max_fee set to 0.5 ether to make sure his proofs get the highest priority in the batcher's queue.
now the queue is dosed because the batcher will be stuck retrying to resubmit the same merkle root as in 1 forever, it fails because of the revert in AlignedLayerServiceManager.createNewTask(). And since the attacker submitted the 16 proofs with 0.5 as max_fee, those 16 proofs will always have priority in the batcher's queue unless someone actually submits a proof with 0.5+1 ether as max_fee.
The attacker sends the same 16mb proof indefinitely and the batcher will eventually run OOM.
notes: 1- the 16mb limit is the default limit in tungstenite-rs lib for a websocket frame, a message can actually be 64mib using multiple frames and therefore reducing cost for an attacker by few 10s of $ but to keep the poc simpler we will be sending 16mb proofs.

A possible solution is adding a salt proof to every batch, generated by the batcher, with the single constraint that salt != 0, adding negligible costs while making the attack expensive.

Jan 16 '25 18:01 Oppen

The description of the problem is not quite right, it's not enough to have a valid batch on the queue, you need to fully control it, else the "malicious" proofs get mixed with the other ones and the sender of the tasks has to pay. We could add a random proof as a salting mechanism but it's not extremely critical, and later improve the protocol to have salted batches or accept repetitions

Aug 26 '25 18:08 MauroToscano