nearcore
nearcore copied to clipboard
State Witness size limit
Background
Current State Witness is implicitly limited by gas. In some cases large contributors to the State Witness size are not charged enough gas, which might result in State Witness being too big for the network to distribute it to all validators in time.
Proposed solution
MVP
Limiting State Witness is not required for Stateless Validation MVP/prototype. Also (1) shows that current mainnet receipts result in reasonable State Witness size, so that won't be an issue for prototyping.
Short Term
In the short term (before launching Stateless Validation on mainnet) we need to implement soft limit for the State Witness size on the runtime side (similar to compute costs). See this comment for more details. This would help to protect agains bringing down the network with receipts that are specifically crafted to result in large State Witness.
Long Term
I believe in the long term we need to adjust our gas costs to reflect contributions to the State Witness size. This means introducing back TTN for reads, charging for contract code size for function calls, etc.
Resources
(1) zulip thread with current witness size analysis (2) https://github.com/near/nearcore/issues/9378
Note from the onboarding discussion - another approach is to add state witness size to the compute costs. It should work good enough for short term and be fairly close to what we want in the long term.
It seems that there are three kinds of objects that contribute to state witness size:
- Incoming receipts and receipt proofs
- New transactions
-
PartialState
produced by executing receipts
We can't really do anything about 1) because there's no global congestion control, which means that the queue of incoming and delayed receipts is unbounded, so the size of source_receipt_proofs
is unbounded as well :/
We'll have to live with this until global congestion is implemented.
With 2) the situation is better. We control which transactions get added to a chunk, so we could add a size limit for new transactions. In prepare_transactions
there's already a gas limit and a time limit, we can add a similar size limit. Once we added transactions which take up more than X MB, we stop adding new ones. AFAU receipts produced by converting transactions should be rather small, so these receipts shouldn't be a big concern.
There's also local congestion control which helps a bit - it stops adding new transactions when the number of delayed receipts gets too high, but it doesn't really limit the size, we need an explicit size limit as well.
We can limit 3) by executing receipts until the PartialState
gets too large. TrieRecorder
records how much PartialState
was produced when executing a receipt, and we can use this information to limit total size of PartialState
. The easiest way would be to add a size_limit
similar to the gas_limit
and compute_limit
- once PartialState
gets too large, stop processing receipts and move them to the delayed queue: https://github.com/near/nearcore/blob/33b5bd7a753a90588f7ea986e0b85f20c8c800e0/runtime/runtime/src/lib.rs#L1485
I think this would be good enough for normal non-malicious traffic, but this kind of limit isn't enough by itself. In Jakob's analysis he found out that a single receipt can access as much as 36 million trie nodes, which would produce hundreds of megabytes of PartialState
. This means that we also need a per-receipt limit - if executing a receipt produces more than X MB of PartialState
, then the receipt is invalid and execution failed. Like the 300TGas limit.
This will be a breaking change, some contracts that worked before could break after introducing this limit, but I think it's necessary to add it, I don't see any way around it.
There's also the question of what the size limit itself should be - In Jakob's analysis he proposed 45MB, but that requires a significant amount of bandwidth - sending 45MB ChunkStateWitness
to 30 validators would require at least 10 Gbit/s connection (!). We've already seen validators start having trouble with 16 MB witnesses, so this limit has to be chosen carefully. The limit also can't be too small, because it'd make the per-receipt size limit very small.
My rough plan of action would be:
- Use
TrieRecorder
to measure how muchPartialState
each receipt produces. Run some traffic and see what it looks like. Add metrics. - Add a
size_limit
when applying receipts - add the basic limit which stops processing receipts when the size ofPartialState
gets too large. This could be enough to run mainnet traffic smoothly. - Add a size limit for new transactions - stop adding transactions when they get too large.
- Implement per-receipt size limit on
PartialState
. This would require careful analysis - it'd be good to go over the blockchain and see if there're any contracts which require > 20MB ofPartialState
to run. Those could break after introducing the limit, so we must estimate what the impact of that would be, warn developers, etc. - Adjust gas costs to reflect how much
PartialState
is produced by executing a receipt. Accessing trie nodes should be as expensive as the resulting size increase is.
A quick and hacky size limit example, stops applying receipts when the size of TrieRecorder
goes above 5MB: https://github.com/jancionear/nearcore/commit/6dd9d4fa5bd161558e109d7b6943207e2b057a6c
Updating the project thread.
I've merged in PR https://github.com/near/nearcore/pull/10703 which adds a soft limit for storage proof size as highlighted in point 3 of @jancionear comment. The next step I was thinking of pursuing was the hard limit for each contract as per the research work that had been done by Jakob. Based on that I had a conversation with Simonas.
Simonas suggested while this is totally doable, we should definitely consider the consequence of adding this restriction on contracts. Historically we've maintained the stance of having backward compatible contracts and adding this restriction can possibly cause some contracts to fail.
We should probably get some statistics on the size of data touched by contracts and (1) whether there are any existing contracts on mainnet already running that may break and (2) whether there are any historic/dormant contracts that may break.
(1) is easily doable as we can just add metrics to the mirrored mainnet traffic. Marcelo is the right point of contact for this. (2) on the other hand is quite a bit of work, but this too has been done in the past. I'm not personally sure whether the work is worth it for our case.
At the end of the day this also boils down to decisions by upper management and we should definitely keep Bowen in the loop and let him know the proposed changes. That said, we should definitely do our research before going to him. As next steps, I propose we add some metrics like P50, P99, P999, P100 to figure out what's the size of data touched by contracts, whether any contracts would break (probably not).
Technical side of things
-
runtime/near-vm-runner/src/logic/logic.rs
is the file we need to take a look at - Within that
storage_read
function is the one that runtime uses to interacts with the trie storage and we can probably explore more to track the size of the storage touched and not just the node count. - Later, while implementing the hard limit, we can keep track of this, return a runtime error (or failed contract execution) if the hard limit is hit and charge the gas.
- Simonas had mentioned we probably don't have metrics within
logic.rs
so we may have to expose the aggregated size as a return value from VM.
cc. @jancionear
Depending issues:
- #10890
- #10780
- #11019