neo
neo copied to clipboard
Add cache for recent on-chain transaction hashes to enhance TPS
Summary or problem description Currently when Actor Blockchain receives a transaction it will need to check whether this transaction is already onchain: https://github.com/neo-project/neo/blob/ae0669639cd806fc8342ae2f75124d58213cdec8/src/neo/Ledger/Blockchain.cs#L357
This piece of code can be severely time-exhausting when height increases & more and more transactions already onchain. This is because levelDB would need to search over the whole DB on disk for the specified transaction.
Do you have any solution you want to propose? A cache for recent transactions' hashes can be added to solve the problem for most cases. Say, a cache to store the hashes of most recent onchain transactions (i.e. transactions within most recent N blocks, where N can be some certain figure like 10 or 20). Thus upon receiving a transaction:
(1) Judge whether ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement >= CurrentSnapshot.Height - N
(2) If satisfied, check whether the specified transaction is within MemPool or transaction hash cache. If neither, the transaction is not yet onchain and can be added to MemPool
(3) Otherwise, besides steps in (2), we need to check levelDB whether this transaction is already onchain.
Note: N should be specified so that in most cases new coming transactions suit condition (2). On the other hand, it should not be too big for the sake of ram-saving. Structure of transaction hash cache should be convinient for updating along with height increasing.
Neo Version
- Neo 3
Env: one consensus node + one non-consensus node which sends transactions to the former win10 system + 12-core computer N = 10

Did you try with a stored bloom filter?
Did you try with a stored bloom filter?
I tried with a 100 bit-per-key bloom filter and got the following result:

TPS with bloom filter acts better than case without transaction hash cashe. It acts as well as case with transaction hash cache at first and then drop bit by bit. The following figure can clearly show the reason:

It can be observed that time used in func Blockchain.ContainsTransaction varies severely in two cases. Case with bloom filter exhausts much more time than case with transaction hash cashe for same or less transactions per block, especially when more and more transactions get on chain, and the overall time used in func Blockchain.OnNewTransaction actually reaches 15s in a block period... So bloom filter is not able to help free func Blockchain.OnNewTransaction from bottleneck status, but transaction hash cache is.
How big it's your cache? It's a FIFO cache?
This is because levelDB would need
Have you tried testing against RocksDB?
(1) Judge whether ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement >= CurrentSnapshot.Height - N (2) If satisfied, check whether the specified transaction is within MemPool or transaction hash cache. If neither, the transaction is not yet onchain and can be added to MemPool (3) Otherwise, besides steps in (2), we need to check levelDB whether this transaction is already onchain.
IIUC this only improves anything for the case when you're actively sending duplicate transactions to the node (because you have to check th DB at step 3 anyway). And you're doing it constantly, even after the transaction was already added into the block. That's probably not something I would expect of a real network and even less so for a synthetic test which makes me ask what is transactions mix you're sending to the node?
For synthetic test I would expect sending unique transactions (at least that's what https://github.com/nspcc-dev/neo-bench/ does) and this problem wouldn't exist at all when testing against solo node. For multiple CNs as well as for a real network I would expect some duplicate invs being received by the node while the transaction propagates through the network (and even less tx), but almost all of them would happen while the transaction is in the mempool of the node where checking for existence costs almost nothing and where the check has to be done anyway.
So I'm wondering if this test is really representative and we're not trying to optimize some rare corner-case.
Have you tried testing against RocksDB?
Have not, will test later.
IIUC this only improves anything for the case when you're actively sending duplicate transactions to the node (because you have to check th DB at step 3 anyway)
Why? For transactions fulfilling step (2) we don't need to check step (3)
And you're doing it constantly, even after the transaction was already added into the block. That's probably not something I would expect of a real network and even less so for a synthetic test which makes me ask what is transactions mix you're sending to the node?
Actually in my test case there are no duplicate transactions. Maybe you mistake something. And actually non-duplicate transactions are the ones that need to be checked firstly in mempool and then leveldb as they have not been seen before, and checking leveldb requires disk reading & is time consuming. THAT IS WHY FUNC CONTAINSTRANSACTION COST TIME AND THE REASON I WANT TO POST THIS DISCUSSION.
but almost all of them would happen while the transaction is in the mempool of the node where checking for existence costs almost nothing and where the check has to be done anyway.
Once more you must mistake something, there are no duplicate transactions in my test case.
How big it's your cache? It's a FIFO cache?
It's some data structure similar to HashSetCache but without bucketCapacity. It contains N entries where each entry contains transaction hashes of a certain history block. This data will be refreshed opun persisting block, dropping the oldest hash set & creating a new one for the persisting block.
The cashe's size depends upon transaction stream, i.e suppose there are averagely 200000 transactions in a block and N = 10, so the cache's biggest size will be 200000 * 10 * 32 = 61Mb
For transactions fulfilling step (2) we don't need to check step (3)
OK, my initial thought was that it's a fail-fast route basically, but upon reviewing the condition number 1 more carefully I see that it's not and it might improve something for the general case.
However, I should note that it's only effective if all transactions use MaxValidUntilBlockIncrement to set their ValidUntilBlock and if you're generating them on the fly, so that new transactions are made against ever increasing current height of the chain. And it's a bit problematic from test point of view because generating (signing) transactions is quite resource-intensive task and doing it on the fly may affect the benchmark result. That's why in neo-bench we generate all test transactions before starting the test, but obviously they all then have the same ValidUntilBlock value which means that this optimization would become ineffective after N blocks. Maybe it's not a big deal as we should be targeting more real life use cases and not synthetic benchmarks, but still something to consider.
That's why in neo-bench we generate all test transactions before starting the test
That's also what I did for my test case, including signing all tx before sending them.
obviously they all then have the same ValidUntilBlock value which means that this optimization would become ineffective after N blocks
My transactions are spreaded over different ValidUntilBlock.
However, I should note that it's only effective if all transactions use MaxValidUntilBlockIncrement to set their ValidUntilBlock
Surely ppl can provide a wrong ValidUntilBlock in their transaction as transaction is man-made. I have some thoughts on this problem as follows.
For example current height is 180 and N = 10, and we calculate Height when tx is sent = ValidUntilBlock - Transaction.MaxValidUntilBlockIncrement. So for transactions that are made after height 170 can be checked by cache and the others must be checked in leveldb. So there can be 2 kinds of wrong ValidUntilBlock:
(1) Higher ValidUntilBlock, i.e. a transaction is made in height 100 but its sender provide a higher ValidUntilBlock to make consensus node think that this transaction is made in height 176. For such occasions we can add a variable in class Transaction: PreviousBlockHash, which stands for the the hash of previous block when the transaction is sent. So for such occasion, even though the transaction pretends to be made in height 176, it cannot provide the correct hash of block 175 and can be dropped.
(2) Lower ValidUntilBlock, i.e. a transaction is made in height 176 but its sender provide a lower ValidUntilBlock to make consensus node think that this transaction is made in height 100. Such occasions can occur when some malfactors want to waste consensus nodes' capacity of calculation. For such occasions we can demand more gas fee for transactions that are not within N blocks.
@Qiao-Jin, I like your points, I think that, for such tool you are proposing, link to the previous hash might be a need. We need to analyse the benefits of this proposal and an extra field.
Since it a local optimization, why do not we just assign a local variable to the TX when the node receives it, saying that it was received at height x and then each node has its own value. Based on this value it uses the cache or not, if the transaction is too much old it will need to directly such on the Leveldb, right?
@Qiao-Jin, I like your points, I think that, for such tool you are proposing, link to the previous hash might be a need. We need to analyse the benefits of this proposal and an extra field.
Since it a local optimization, why do not we just assign a local variable to the TX when the node receives it, saying that it was received at
heightxand then each node has its own value. Based on this value it uses the cache or not, if the transaction is too much old it will need to directly such on the Leveldb, right?
Yes we can assign such local variable, but I think there can be a bug as follows, if I am not misundestanding:
(1) Firstly I send a transaction and it's onchain in height 170
(2) As I know N = 10 I will resend the same transaction in height i.e. 190. At this time this transaction will neither be in mempool nor in hash cache and will be regarded as not seen before, and will be onchain again...
Please correct me if I'm wrong :stuck_out_tongue_closed_eyes:
I think it will be invalid during verification.
I think it will be invalid during verification.
I don't think we check duplicate hashes in tx verification.. And yes, we record know hashes in remotenode, which might be useful for checking... But once connection is lost or node restarted this will lose effect.. So still there is chance for duplicate tx onchain
Any advise?
What about replace
if (MemPool.ContainsKey(hash)) return true;
with
if (KnownHashes.ContainsKey(hash)) return true;
It is a cache and it is updated
What about replace
if (MemPool.ContainsKey(hash)) return true;withif (KnownHashes.ContainsKey(hash)) return true;It is a cache and it is updated
In such case every new coming transaction will still need to be checked in levelDB as they have not been seen before, even though they are created just now. Besides a hash cache we need to judge the height where the transaction is created to determine, whether to search in levelDB if it's not in knowhashes. Only by so time is saved.
Lower ValidUntilBlock, i.e. a transaction is made in height 176 but its sender provide a lower ValidUntilBlock to make consensus node think that this transaction is made in height 100. Such occasions can occur when some malfactors want to waste consensus nodes' capacity of calculation. For such occasions we can demand more gas fee for transactions that are not within N blocks.
Another method is to use node health for such cases. Nodes that keeps sending transactions in low height (i.e. < Height - 20, this is just an example and we should make certain of the value after careful observation, and use the same value for cached height) should be banned for some period. And we should tell users about this rule.
I think this can solve the problem.
https://www.eecs.harvard.edu/~michaelm/postscripts/aller2006.pdf
This paper describes dynamic bit reassignment, an approach that allows the size of the fingerprint to flexibly change with the load in each hash bucket, thereby reducing the probability of a false positive.