solana icon indicating copy to clipboard operation
solana copied to clipboard

Enforcing max account data cap may introduce indeterminism

Open jstarry opened this issue 2 years ago • 7 comments

Problem

While processing transactions, account data deltas are recorded at every instruction boundary by calling adjust_delta in InvokeContext::verify https://github.com/solana-labs/solana/blob/37f4621c064b0f2dbc596618b4c31f38bbc78afe/program-runtime/src/accounts_data_meter.rs#L81-L87

If the cluster is very near the max account data cap, a single transaction instruction could allocate enough data to exceed the limit and result in a failed transaction. However, this error could be indeterministic if a batch of transactions contains a transaction which allocates data and a transaction that frees data. Since validators process parallelizable transactions in a random order, some validators could free data before allocating and not hit the max account data limit while other validators could allocate data first and hit the limit. This would result in different execution results for the same block.

Proposed Solution

  • Only check the max account data cap after processing the entire block
  • Add a transaction specific data meter that limits how much data can be allocated while processing a single transaction

jstarry avatar Jul 06 '22 11:07 jstarry

Yep, good catch. This specific case needs to be handled correctly.

Some more context:

There are checks/limits for both total accounts data size, and per block accounts data size. Both of these are checked in Replay Stage for this reason. And the "total" one includes more discussion context:

  • total: https://github.com/solana-labs/solana/pull/23422
  • per block: https://github.com/solana-labs/solana/pull/25524

There are differences for total and per block that pertain to this issue. The per-instruction error is only thrown if the total accounts data size limit is exceeded. Whereas with per block, we only check at the end of execute_batch() https://github.com/solana-labs/solana/blob/6f5857a5dba8bb75f5a56bf6210142b4db86f182/ledger/src/blockstore_processor.rs#L241

Since per block there isn't an instruction error thrown, I think #25517 is safe. This is the one I currently have in the feature activation schedule.

For total, I do not have that one on the activation schedule, because I am still resolving issues around calculating the accounts data size at boot, i.e. the initial size for a bank. I've add this issue to get tracked in https://github.com/solana-labs/solana/issues/21604.

As for a solution, I think I can safely remove the instruction error from the instruction processing and defer checks to the end, in the same way it is done for per block checks now. Wdyt?

brooksprumo avatar Jul 06 '22 13:07 brooksprumo

I think that https://github.com/solana-labs/solana/issues/25517 is also unsafe because sometimes consecutive batches are processed in parallel if they don't conflict. If one batch allocates and another batch frees, validators could encounter a race condition there as well.

jstarry avatar Jul 06 '22 14:07 jstarry

[...] sometimes consecutive batches are processed in parallel if they don't conflict. If one batch allocates and another batch frees, validators could encounter a race condition there as well.

Ooo, OK yeah, that's an issue. Will dig in.

brooksprumo avatar Jul 06 '22 14:07 brooksprumo

@brooksprumo do the proposed solutions make sense? I think the account data metering used during transaction processing can be changed to impose a tx specific limit of like 100MB rather than using the global max cap.

Also want to call out that https://github.com/solana-labs/solana/pull/26438 has mimicked the current broken implementation by tracking the remaining block account data in the TransactionContext so it will need to be fixed as well

https://github.com/solana-labs/solana/blob/b4b1f95b6a408a1472646aafe386d2f8daa1f75d/runtime/src/bank.rs#L4282-L4288

jstarry avatar Jul 08 '22 08:07 jstarry

@brooksprumo do the proposed solutions make sense?

Yep! Working on moving the replay stage checks to replay_active_bank() once the bank is complete. Here's a bit more context: https://discord.com/channels/428295358100013066/439194979856809985/994646411234775110

I think the account data metering used during transaction processing can be changed to impose a tx specific limit of like 100MB rather than using the global max cap.

There is a 100 MB block limit in additional to the global limit. I think a per-transaction limit could be added as well. That probably should have its own discussion and/or feature-gate.

Also want to call out that #26438 has mimicked the current broken implementation by tracking the remaining block account data in the TransactionContext so it will need to be fixed as well

https://github.com/solana-labs/solana/blob/b4b1f95b6a408a1472646aafe386d2f8daa1f75d/runtime/src/bank.rs#L4282-L4288

Yep! That's part of the refactor from https://github.com/solana-labs/solana/pull/25899. I talked with @Lichtso about changes that will need to happen with the accounts data tracking, so we should be on the same page. In particular, AccountsDataMeter will be going away/absorbed into TransactionContext. I don't think the way this is getting tracked is broken, just the bits that check for errors/throw errors will change. In particular, TransactionContext won't throw instruction errors if the global accounts data size is exceeded; that check will only happen in replay_active_bank() once the bank is complete.

brooksprumo avatar Jul 08 '22 13:07 brooksprumo

I think a per-transaction limit could be added as well. That probably should have its own discussion and/or feature-gate

Could you please open one?

jstarry avatar Jul 08 '22 14:07 jstarry

I think a per-transaction limit could be added as well. That probably should have its own discussion and/or feature-gate

Could you please open one?

Opened #26502 to track this. I will update that issue with more information shortly.

brooksprumo avatar Jul 08 '22 14:07 brooksprumo