massa icon indicating copy to clipboard operation
massa copied to clipboard

Pool protection

Open damip opened this issue 3 years ago • 4 comments

Op flooding

The operation pool is currently unprotected from flood: someone can send many operations that claim high fees but don't have the funds to pay them in order to eliminate all other ops from the pool.

To counter that, we need to add a worker that periodically scans through the pool and checks that the total amount of fees claimed by a given address across all its pool operations is lower or equal to the sequential balance of the address. Excess operations for that address need to be eliminated from the pool.

A more subtle variant

A more subtle variant of this attack consists in doing the following:

  • have many addresses, each with a decent sequential balance
  • flood with 1 op per address, with a higher fee than everyone else
  • make sure that the validity periods of the ops start some time in the future
  • then, just before the validity range of the ops starts, move all the sequential coins to new addresses
  • this will make the ops dominate the pool for a potentially long time

To counter it, the only option I see for now is to drop operations whose validity range starts more than 1 period in the future. Such ops can be ignored directly in when they are created from API or received from protocol. That way, pool never sees them.

damip avatar Jul 29 '22 08:07 damip

To counter it, the only option I see for now is to drop operations whose validity range starts more than 1 period in the future.

If you do that it will be a problem for async smart contract or it's not related ?

this will make the ops dominate the pool for a potentially long time

I don't understand why because if the ops validity range as started and the operation is the most profitable, it will be executed and fail because not enough gas.

AurelienFT avatar Aug 02 '22 08:08 AurelienFT

To counter it, the only option I see for now is to drop operations whose validity range starts more than 1 period in the future.

If you do that it will be a problem for async smart contract or it's not related ?

Async smart contracts use a separate, deterministic pool that is not affected by this as all the funds required for running the async messages are pre-reserved on message emission.

this will make the ops dominate the pool for a potentially long time

I don't understand why because if the ops validity range as started and the operation is the most profitable, it will be executed and fail because not enough gas.

The idea of the attack is:

  • create 10000 addresses with enough sequential funds
  • emit 1 operation per address that spends more fees than all honest txs in the pool and for which the validity range starts in 10 hours
  • when we reach the last block we are supposed to create within those 10 hours, use the block to move away all the sequential coins from the addresses that were supposed to pay for the fees, for free
  • during the 10 hours, the funds to pay for the high fees of those ops are present, and they are therefore kept in the pool as a priority over honest ops that are eliminated from the pool
  • we can't limit per-address because every op is linked to a unique address
  • if we set a max future time (~1hour), the attacker can simply do the same, but at the 1 hour scale and loop the attack with overlaps to keep the pool unusable all the time

=> result = not a single honest transaction can be included in any block and the attacker loses 0 coins in the process

damip avatar Aug 02 '22 10:08 damip

Pool protection

Today, adding an operation is'nt the target of particular tests concerning the fees. We can't predict in a reasonable way if, at the block construction, they will be valid or not. Especially if the creator address has enough balance to pay the fees.

The list of operation in pool is ordered with the ratio size/fee. And at the block construction a kind of greedy knapsack is applied. We choose from the ordered set the firsts with a valid period and check if the creator can pay the fees. But none of these tests write on the pool and delete anything.

Reguarding the two informations in italic, we look at two potentials problems.

  1. A number of evil address would fill the structure with operations that will be valid, but in a while. If possible, they have an important weight

  2. An address can fill the structure with a lot of operations infinitely. And make sur that it never has enough balance to pay the fees.

Some improvements can be made to try to solve these problems.

Fees counter

On adding an operation, we update in a structure the total fees an address should pay if all its operations are executed. When an operation is selected to be included into a block, we update the structure too.

A bot frequently wake up and lock the operation pool to check:

let address_balance_map
global total_fee
for each op in the selected operations
    let balance = address_balance_map
        .get(op.address)
        .or_insert(find_balance())
    if total_fee(op.address) > balance
        total_fee(op.address) -= op.fee
        remove op

The frequency and the quantity of operations can be random in a given range. The frequencies, quantities or ranges could be sets by a local configuration file.

The bot will look at the operation pool by chunks. He try to look at all the operations at least once.

Note: The lock of the operation pool can be expensive if we try to look at a big chunk. We could think at a kind of sharding on the storage. As well, lock a batch of operations with a high ratio let another thread to lock those with a low ratio.

Write on the disk

The number of operations can grow up even if we check regulary the possible attacks. But the addresses can also add operations with a valid fee and try to fill the pools. That kind of attacks are hard to detect. Especially if the ratios are "acceptable" but the delay (validity period start) is too long.

Is that case, we can accept only the operations with a short delay. They will be dropped quickly. And in all cases, store them on the disk, to avoid any massiv attacks.

A bot can regulary select a random quantity of operations from the highest ratios. Choose one of the following action: store on the disk, swap with operations on disk.

At some points, if the pool size decreases, the bot is also able to load datas.

adrien-zinger avatar Sep 21 '22 15:09 adrien-zinger

@adrien-zinger Good proposal. Here are some of my thoughts.

1- if we're only looking at a batch of ops at every iteration, we cannot account for the whole fee spending of a given address 2- reading/writing on the disk is super heavy, I would avoid it

Important note: dropping an op from pool is enough: caches in Protocol will prevent it from being re-sent to pool in the future.

Here is a proposal to change things slightly to make them perform better:

  • keep an index index_by_address: PreHashMap<Address, PoolAddrInfo>

    • struct PoolAddrInfo { max_spending: Amount, ops: PreHashSet<OperationId>, last_robot_scan: Option<(MassaTime, Amount)> }
    • total_spending is the cumulated max balance spending (op.get_max_sequential_spending()) of all ops emitted by this address
    • last_robot_scan is None if there was no scan, and otherwise provides the pair (last_balance_retrieval_time, retrieved_balance)
  • regularly wake up the robot to do the following:

    • take a batch of addresses in index_by_address among the ones for which last_robot_scan is the lowest (None is lower than Some(_))
    • get the batch of balances from Execution for all the addresses of the batch
    • for each address_info in this batch:
      • update address_info.last_robot_scan with the current date and retrieved balance
      • while address_info.max_spending is strictly higher than the balance:
        • drop the least prioritary op among address_info.ops from the pool
        • remove it from address_info.ops and substract its max spending from address_info.max_spending
      • if there are no more elements in address_info.ops, delete the complete address_info entry
  • when an operation op from address A is to be added to the pool:

    • reject the op if its validity ended before now()
    • reject the op if its validity starts more than (or equal to) t0 in the future
    • let addr_info = index_by_address.entry(A).or_insert_default()
    • add the op ID to addr_info.ops
    • addr_info.max_spending = addr_info.total_fee.checked_add(op.get_max_sequential_spending()) // if checked_add returns an overflow, ignore the op
    • while addr_info.max_spending is strictly higher than the balance:
      • drop the least prioritary op among address_info.ops from the pool
      • remove it from addr_info.ops and substract its max spending from address_info.max_spending
    • if the op was not dropped above, add the op to the pool and prune pool
  • sorted pool index: currently ops are sorted by return-on-investment (RoI) to choose which ones to keep/prune and add to blocks in priority. Make the criteria (lowest is better): (scan_age, RoI)

  • we define scan_age: u64 = now.saturating_sub(last_balance_retrieval_time.unwrap_or(MassaTime::MIN))).checked_div(t0).unwrap() : the lower the more prioritary

  • when pruning the pool:

    • drop all ops whose validity ended before now()
    • drop lowest priority overflowing ops
    • don't forget to prune the various indexes from removed ops (includng index_by_address)

Future ideas: make pool be draw-aware:

  • check when are the next slots our addresses will be creating blocks
  • if none, no need to run the pool
  • if some, only focus on keeping ops that would fit in the slots we are selected for

damip avatar Sep 21 '22 18:09 damip

Solution in https://github.com/massalabs/massa/issues/4009

damip avatar Jun 05 '23 09:06 damip